\ 


Mathematical Models S aeaniration of Efficient 


Computer Riathe gl oe Estimating Weibull Parameters in 
Reliability ‘ Stud Ud! lies been e eres rece eet ene nese eee eetereenede io K. K 


ein = 
eS heats 198 oh Fig NGG, Hopkin 


- at 


Effects of Ambient Temperature on Electron. Tubes 


ee) ne, ey 
lomorrow s Qua ‘lity y Demands . Ren Es iie wuoleacshecarieneiiel es were (6418. '6 fe (6)"6.\6. e°s).0) (0's ln (ey, Imus 


November, 1957 


. 


Progress In . TV-Receiver R eliability FE a cee Ch ea ON IL ae ear es aC One TCS aE tal Boden 


Relia bil ility Contr: ol Base don Mu Itiple Sequenti tial Feedbac $09 ee oie He. M. Ryerson 


@ 


Studen it Announcement yc) hot iptat COC CORON kD. O10 Sere Cm pa TRC EMD DCN ORC INC a ad Oma Aad 


‘PUBLISHED 6 BY ‘THE 


a 


Ar * ra Pa : en 


ubleshooting Routes eA Jeet 1 and Ey Saltz 
¥ 


45 


57 


fe 


_GR ROUP, ON RELIA \BILITY AND / QUALITY ConTROL 


|. 2 ae 7 Lan 


PK, McELnov, Chairman 


E. J. BREIvine pe ag FieaChee 
Davin A. Hitt C. M. Ryerson . ah 


R. F. Roitman, Seer cvany ale << 
H. J. Stryker, Treasurer 


Leon Bass L. J. Jaconson J. R. Seen 
J. W. Greer Wa Jlameeda Victor Woux 


~ 
~ 


IRE TRANSACTIONS® > 
on Reliability and Quality Control 


Published by the Institute of Radio Engineers, Inc., for the Professional Group 
on Quality Control, 1 East 79th Street, New York 21, New York. Responsibility 
for the contents rests upon the authors, and not upon the IRE, the Group or 
its members. Individual copies available for sale to IRE-PGRQC members at 
$0.90, to IRE members at $1.35 and to nonmembers at $2.70. 


© 1958 — Tue Institute or Rapio Encrneers, Inc. 
All rights, including translation, are reserved by the IRE. Requests for republication privileges 
should be addressed to the Institute of Radio Engineers, 1 East 79th Street, New York 21; Nee Ye 


PRINTED IN THB U.S.A. 


MATHEMATICAL MODELS FOR DETERMINATION OF 
EFFICIENT TROUBLESHOOTING ROUTES* 


Arthur J. Hoehn and Eli Saltz 
Air Force Personnel and Training Research Center 
Lowry Air Force Base, Colorado 


The complexity of modern equipment used by industry and by the military has re- 
sulted in a serious problem of providing adequate maintenance. Of the various 
tasks which must be performed in maintenance of such equipment, the most diffi- 
cult is troubleshooting, or the location of malfunctions. Under current methods 
of training and utilizing maintenance technicians, it is generally considered 
necessary that the technicians have a comprehensive grasp of the intricate inter~= 
relationships between the parts of equipment and the way the parts function as a 
system. Knowledge of these interrelationships often requires a background of 
high level physics and engineering. Despite this, technical schools, both in in-= 
dustry and in the military, have often been given the almost impossible task of 
providing such knowledge in short courses of a year or less, 


In the face of this situation, various approaches to solution of this prob- 
lem are being attempted. The present paper pertains to two methods of dealing 
with the problem through simplification of the troubleshooting tasks of mainten- 
ance personnel. These two approaches to simplification of maintenance require- 
ments are (a) the design and utilization of more or less automatic testing equip- 
ments, and (b) the improved design and use of on-the-job performance supports in 
the form of easy-to-follow job instructions. 


The reader is probably well aware of the current attention being given to 
the development of testing devices which permit the maintenance technician to 
isolate malfunctions largely by positioning switches and reading dials on a pan= 
el. Considerably less attention has been directed to the possibilities of well 
designed and organized job instructions or "performance guides," but it seems 
possible to increase the performance capabilities of relatively inexperienced 
mechanics and to minimize training required for many maintenance positions by 
utilizing detailed step-by-step troubleshooting guides. Guides of this nature, 
which present in detail the efficient behavior routes for locating most mal- 
functions in particular equipment systems, seem capable of greatly reducing the 
amount of information and the complexity of the concepts which the mechanics must 
learn and retain. They also provide a means of bypassing some of the difficult 
problem-solving processes which relatively few mechanics can properly accomplish. 


Both in the design of more or less automatic check-out equipment and the de- 

- gign and construction of troubleshooting guides, there is a requirement for tech~ 

niques of isolating optimal checking routes for the troubleshooting process. The 
identification of such optimal checking routes (i.e., statements of the optimal 


Thi rt is based on work done under ARDC Project 7709, Task 37304, in sup- 
oat pt the research and development program of the Air Force Personnel and 
Training Research Center, Lackland Air Force Base, Texas. Permission is granted 
for reproduction, use, and disposal in whole or in part by or for the see 
States Government. The writers wish to thank Prof. 5S. I. Pearson, College o 
Engineering, University of Colorado, for reviewing the manuscript of this paper. 


1 


selection and sequencing of troubleshooting checks) is a complex mathematical 
problem. Research is at present under way on possible solutions to this problem 
using sequential decision theory, information theory, and perhaps graph theory. 
However, several interim solutions can be suggested that should prove useful. 


In the present paper we shall first examine the general characteristics of 
several interim solutions. This will include examples of how the solutions might 
be applied in specific, simple situations. Then we shall present a condensed 
table indicating the optimal conditions for use of each solution. 


INFORMATION PROCESSING IN TROUBLESHOOTING 


Before dealing with the specific solutions, we shall consider some of the 
general characteristics of the systematic troubleshooting process. This process 
involves selection and performance of a sequence of equipment checks. Each check 
serves to isolate an existing trouble into a smaller and smaller subset of the 
total population of N faults which could possibly occur in the equipment. 


The maintenance man should, in each instance of equipment malfunction, 
choose and accomplish a set of checks to determine which of the possible faults 
is actually occurring. For any particular check to contribute to the isolation 
of the actual trouble, the check mst lead to a reduction in the number of pos- 
sible faults which remain to be considered. Results of a single informative 
check may eliminate from consideration as few as one of the possible faults. In 
some situations, a check may eliminate as many as N-l of the possible faults. 
(In the vanes case, a single check results in identification of the actual 
trouble. 


Each check in a systematic troubleshooting process is in itself a sequence 
of behaviors that can be made routine. However, the set of checks utilized in 
troubleshooting will vary from one instance of malfunction to another according | 
to what the cause or source of malfunction is. Each relevant or informative 
check terminates in the noting of certain feedback information. The trouble~ 
shooter who is working without a troubleshooting guide mst interpret this in- 
formation and use it as a partial basis for choosing the next check. In essence, 
the feedback information indicates the possible faults which can now be elimin- 
ated from consideration, Thus the information restricts the areas of the equip- 
ment with which the next check should be concerned. Choice of the next check 
should be based on information such as: 


1. The interrelationships of the components which are still possible causes 
of the trouble. These interrelationships will determine the amount of in- 
formation that can be gotten from any given subsequent check. 


2. The relative probabilities of malfunction for the possible causes. 
3- The relative worktimes involved in checking the possible causes. 


h. The principles for combining the above information so as to determine a 
most efficient next check. 


Obviously in a complex equipment system the problem of processing the above 
information so as to come up with an optimal next check may be formidable. Ulti- 


2 


mately, specification of the behavioral content of troubleshooting guides will 
involve (a) specification of the behaviors involved in performing each individual 
check which can be made routine, and (b) specification of the sequence of checks 
which should be performed upon the occurrence of each possible malfunction. The 
second problem appears to be the more complex of the two, and it is with this 
problem that the present paper is concerned, 


DETERMINING EFFICIENT ROUTES 


Within the general concept of the troubleshooting process outlined above, 
there are several principles or strategies which can be followed as interim so- 
lutions. These solutions can be characterized as the worktime-probability solu- 
tion, the half-split solution, and the half-split on worktime-probability solu- 
tion. They were originally suggested as procedures to be developed by maintenance 
men on each new troubleshooting problem. However, we believe that these ap- 
proaches can also be applied to the problem of programming sequences of checks 
for an entire equipment system. 


Worktime-Probability Solution 


Stolurow? has been most active in the development and use of this technique. 
It is based on knowledge of the relative frequencies of the various possible mal- 
functions and the time required for checking individual components. Stolurow has 
worked out his concept of the efficient course of action in troubleshooting in 
connection with reciprocating engines. According to his analysis, efficient lo- 
cation of defects requires the following sequence of behaviors: 


1. Observing indications of engine performance, particularly dials, and in- 
terpreting them in terms of standards of normality. 


2. Associating patterns of indications both with malfunction conditions and 
with the various faults of system components that could produce these 
conditions. 


3. Planning a checking sequence for the underlying systems on the basis of 
fault probability and worktimes required in checking the possible faults. 


This procedure starts with what the present writers will refer to as opera- 
tional checks. These consist of determining the patterns of readings of four 
dials (manifold pressure, fuel flow, rpm, and TOP) for each of three power set- 
tings. The different patterns of indications point to different malfunction con- 
ditions such as "dead cylinders," "excessive oil consumption," "flooded engine," 
and so forth. Suppose, for example, that the pattern of dial readings is that 
associated with the dead cylinder malfunction condition. This malfunction may 
derive from different systems in the engine and from any one of several possible 
faults within these systems. Then the dial patterns are those indicating the _ 
dead cylinder malfunction, the trouble has thus already been isolated into a. 
relatively small subset of the total population of possible troubles in the en= 
gine. The troubles still to be checked for in the case of the dead cylinder 
malfunction condition are those listed in Table I. 


The problem now is, having arrived at the malfunction condition and this 
relatively small set of possible trouble sources, what should be the strategy 


3 


TABLE I 


Dead Cylinders 


Ignition System Basic Engine 
"pt Jead shorted to ground Intake valve stuck open 
Open primary or secondary Exhaust valve stuck open 


transformer coil 
Condenser shorted to ground Cracked cylinder head 


Breaker points shorted to ground Cracked piston head 


Breaker points stuck open or Faulty rings 
closed 
Ignition harness leads shorted Worn valve seats 
to ground or open 
Dead spark plugs Worn, burned, and pitted valves 
Fouled spark plugs Worn valve guides 
Relay points in TUSC stuck Feathered valves 
together 


Stretched valves 


from this point in order to most efficiently isolate the one particular fault 
existing in the ignition system or the basic engine? Using the minimum average 
time per discovery of a fault as the operational criterion of the efficient 
course of action, the proposed procedure is to check the possible troubles as- 
sociated with the malfunction in the order of increasing t/p value, where t rep- 
resents the time required to check for a particular fault and p the probability 
of the fault. Suppose, again, that the dead cylinder dial pattern occurs. 


The first decision is whether to check first the ignition system or the 
basic engine. Suppose also that with this pattern of dial readings there is a 
probability of .80 that the trouble resides in the ignition system, and a prob- 
ability of .20 that it resides in the basic engine. Suppose further that the 
average amount of worktime involved in checking the ignition system is four 
hours, while the average amount of time required for checking the possible 
troubles in the basic engine is ten hours. The corresponding t over p values are 


1. Ignition system:  hours/.80 = 5,00 

2. Basic engine: 10 hours/.20 = 50.00 
Thus, the decision would be made to check the ignition system first. The remain=- 
ing set of checks to be selected would be based on the t/p values computed for 
each of the possible troubles which may exist in the ignition system, given the 


4 


dead cylinder dial pattern. The possible faults would be checked in the order of 
increasing t/p value. 


The worktime-probability technique, like other troubleshooting procedures, 
is a general strategy whereby the population of possible faults is systematically 
reduced until the one existing fault has been identified, As applied to the re- 
ciprocating engine situation it determines the checking sequence only beyond the 
point where operational checks (readings of dials and interpretations of dial 
patterns) have isolated the trouble into a relatively small group of possible 
faults. It does not concern itself with the interrelationships between compo= 
nents. Rather, it proceeds as though the components were strictly independent of 
one another. 


In evaluating the t/p solution, certain of its characteristics mst be un- 
derstood. 


1. The solution minimizes the average time for discovering faults only when 
certain assumptions are met. The t/p solution eliminates only one compo- 
nent at a time. If a check at one component could potentially give spe- 
cific information about the functioning of other components (i.e., if the 
components are not independent in their functioning) the t/p solution will 
not use this information; consequently, the t/p solution in such cases may 
not be an optimal one. This sharply restricts the types of equipment for 
which this solution is optimal. 


2. The solution was developed to minimize the average time for discovering 
faults in a system. Even when the assumptions of the t/p solution ere met 
the range of times necessary for fault location is likely to be very great 
for a complex system. Most troubles will be found very quickly. The rare 
troubles may take an extremely long time to find. This suggests that the 
t/p solution should not be used with extremely complex equipment. 


3. In order to apply the t/p solution, it is necessary to have relatively re- 
liable empirical data on the worktimes and probabilities associated with 
potential malfunctions. If the worktimes are equal for all checks and if 
probabilities are also equal, the t/p solution will break down into a ran- 
dom checking sequence. 


Half-Split Solution 


While the t/p solution suggested by Stolurow may be said to be an empiricale- 
systematic technique, the half-split procedure suggested by Miller, Foley, and 
Smith+ is a logical-systematic procedure. They have explained the half-split 
method by reference to a straight series chain of components such as is graph- 
ically shown in Fig. 1. However, the method appears to be applicable, at least 
in principle, to much more complex systems of components. The essential feature 
of the half-split method is "that each succeeding check is made at the midpoint 
of the remaining segment of the chain (or within a specified distance, in terms 
of the number of check points, from this point)." 


Consider the system of 12 components depicted in Fig. 1, with the input to 


the system at the first component. The first check in this system would be made 


between components 6 and 7. This check, in effect, splits the system into two 


~~ 


5 


Equal Times and Second Check Second Check 
Probabilities: (If first check out (If first check 
of tolerance) First Check within tolerance) 


|! 
yoo) ) ef fa-ovn 


(10%) (5%) (3%) (7%) (5%) (7%) (6%) (7%) (12%) NED tae (5%) 


Equal Times, 
Indicated Probabilities: Second Check Second Check 
(If first check out (If first check 
of tolerance) First Check within tolerance) 


Fig. 1 = Series-chain half-split method. 


subsystems and checks components 1 te 6 If the results of this check are within 
tolerance limits, components 1 to 6 are functioning properly and the remaining 
components, 7 to 12, are, in effect, split into two subsystems by checking be- 
tween components 9 and 10. On the other hand, if the check between components 6 
and 7 is not within tolerance, the next check would be between componenis 3 and 
h, etc. Under certain assumptions, this technique will, as Miller, Foley, and 
Smith point out, result in the same number of checks to isolate a trouble, no 
matter what that trouble might be. In. other words, this procedure minimizes the 
greatest number of checks necessary for identifying any malfunction. 


Any 12 component system in which the components are connected in series, 
such as illustrated in Fig. 1, will require four checks to eliminate all compo- 
nents but one from consideration as possible faults. If a verification check is 
made on the one remaining component, a total of five checks is required to locate 
any trouble. This contrasts sharply with the t/p solution, which would require a 
minimum of one check (if the first component tested was the malfunctioning compo- 
nent), and a maximum of 12 checks. If the worktime and probability data were not 
known for this system, or if all worktimes and probabilities were equal for the 
12 components, the t/p solution would result in an average of six checks to iso- 
late a trouble, over a large number of malfunctions. Thus it can be seen that if 
the assumptions for the t/p solution are not met and the assumptions of the half- 
split. solution are met, the half-split will be the more efficient procedure. On 
the other hand, if the probabilities and worktimes are known and are quite dif« 
ferent for the various components, and if the components were independent (i.e., 
there was no flow of energy starting at one component and ending as an output 
from the last component), the t/p solution would be more efficient, 


Usually equipment characteristics are such as to permit only an approxima- 
tion of the half-split technique. Consider an illustration of an application of 
the half-split technique to an actual, specific piece of equipment, the C97-C 
landing light system. The schematic for this system is shown in Fig. 2. The 
troubleshooting procedure devised was based on the assumption that no more than 
one malfunction will be present in the equipment at a time, and largely disre- 
gards probabilities and worktimes,. 


The first check consists of moving the light switch to ON with the light in 
retract position. When this is done, nothing should happen, since limit switch D 
is open when the light is retracted, If the light comes on, the trouble mst re-~ 
side in limit switch D. If the light does not come on, the existing malfunction 
may be any one of the possible troubles other than the limit switch. This first 
check is a very inefficient one as assessed on the basis of the half-split prin- 
ciple, but it is very simple to notice if the light comes on, and the switch 
movement is necessary in order to perform the remaining checks. 


Let us assume that on the first check the limit switch is found to be prop=- 
erly operating; that is, the light does not come on. In this case the second 
check is to move the right-hand extend-retract switch to extend. When this is 
done the right light should extend and come on. There are three possible out- 
comes: the light may extend and come on, the light may not extend, or the 
light may extend and not come on. Each of these results provides a basis for 
discarding from further consideration a sizable number of possible troubles. If 
the light extends and comes on, then the troubleshooter need no longer concern 
himself with those possible troubles which are associated with the light circuit 
or the extend circuit; if the light does not extend the troubleshooter can dis- 
card those possible troubles associated with components in the retract circuit 
and the light circuits; and finally, if the light extends but does not come on 
the troubleshooter can eliminate from further consideration the checks needed to 
locate troubles within the extend circuit and the retract circuit, 


This check (moving the extend-retract switch to extend and noting whether 
the light extends and comes on), like the first check, does not exactly fit the 
half-split solution. It divides the population of the remaining possible 


LIMIT SW 5D 


MAIN CIRCUIT BREAKER 
PANEL 


28V BUSS ID 


CIRCUIT =2 
BREAKER 
20 


Fig. 2 - Circuit diagram of landing light system. 


7 


troubles into three unequal subsystems (the extend circuits, the retract circuit, 
and the light circuit) and provides the basis for eliminating troubles in two of 
the three subsystems. A departure from the half-split in making the second 
check, as well as some of the other departures which were made in the construc- 
tion of the guide, is derived in part from the nature and structure of the equip- 
ment and the apparent desirability of utilizing operational checks as a point of 
departure in the troubleshooting procedure (a matter to be discussed more fully 
at a later point). 7 


The rigorous application of the half-split technique comes in primarily 
after some departure from standards has been observed in performance of the oper= 
ational checks. Consider, for example, the checks which follow failure of the 
light to extend. In this case, the troubleshooter has isolated the existing 
trouble into the extend circuit of Fig. 2. Note that there remain only eight 
possible troubles to be explored. These troubles are listed below in their 
serial order within the circuit. 


1 ABC -= 28 v Buss 

2 ABC -= circuit breaker 

3 ABC -= line 3 ABC 

lh B == extend pole of extend-retract switch 

5B «== line 5B 

6B «= line 6B 

7B «= line 7B 

8B -= the extend limit switch. 

The procedure by which the troubleshooter is directed to identify the 

trouble conforms perfectly to the requirements of the hatf-split technique. The 

first check called for is at connection between line 5B and hB, the extend pole 

of the extend-retract switch. If normal voltage is obtained at this point, all 

components between 1 ABC, the 28 v Buss, and lB, the extend pole, are all rights 

_ the trouble mst be between 4B and 8B. If normal voltage is not obtained, the 
possible trouble mst reside between the 28 v Buss and hB. Figure 3 describes 

the complete sequence of possible checks for a trouble in the extend circuit. 


If it had been possible to conform to the half-split technique throughout 
the troubleshooting procedure, six checks (including the operational checks and a 
verification check) would be both necessary and sufficient to isolate any trouble. 


: The guide for the complete landing light system depicted in Fig. 2 re- 
att a eet ce two checks and a maximum of seven to locate the various 
roubles. us the nature of the system resulted in some departures f 
strict half-split solution. : ss 


A sae important characteristics of the half-split technique are summarized as 
ollows: 


1. It appears to be the only systematic approach which is applicable whe 
yema’ n 
probabilities of malfunction are equal or unknown, and Ri RG te ine 
volved in making checks are equal or unknown. 


2-e it minimizes the greatest number of checks nec 
Seer essary for isolating any 


CHECK 2A: 

Check voltage at 
connection between 

2 ABC (circuit breaker) 
and line 3 ABC 


CHECK 2B: 
Check voltage at 
filter, function 
of line 6B and 7B 


No Voltage 
Voltage 


No Voltage 
Voltage 


Half-Split on Worktime-Probability Solution 


x 


CHECK 3A; 
Check voltage at 
1 ABC (28 v Bus) 


& CHECK 3B; 
Check voltage at 
xtend-retract 


aN. 


switch, function 


line 3 ABC and 4C 


CHECK 3: 
Check voltage at 


wing break, should 


junction line 5B 
and 6B. 


CHECK 3G; 
Check voltage 
at limit switch 


8B 


No Power on bus 
(1 ABC) 

No Voltage 

Voltage 


Circuit breaker 
(2 ABC) OUT or 
defective 


a 


Line 3 ABC 
defective 


No Voltage 
Voltage 


/ \ 


Extend-Retract 
switch (4B) 
defective 


Line 5B defective 


No Voltage 
Voltage 


Line 6B defective 


Line 7B defective 


No Voltage 
Voltage 


Light assembly 
(8B extend position) 
defective 


S/ \ 


Fig. 3 = Half-split guide for extend circuit. 


It utilizes the interrelationships between components and sequence checks. 
If the components are independent, this technique is not applicable, 


In those instances in which there are interrelationships between components 


but worktime and/or probabilities are known and are markedly unequal, neither the 


checking sequence. 
by splitting on worktime or probability, or a ratio of the two. 


t/p solution nor the half-split solution outlined above will produce an optinal 
It is possible to come closer to an optimal checking sequence 


Figure 1 indi- 


cates the construction of a sequence for the case in which worktimes are equal 


but probabilities are divergent, 
stance in which probabilities are equal but worktimes differ. 


A similar solution can be obtained for the in- 


Starting with a half-split solution, it is possible, at least in some in- 


stances, to obtain an improved solution by utilizing combined worktime and prob- 


ability information. 


The worktime and probability data collected should include 


probability and worktime for making each of the direct checks on the possible 
faults during the appearance of the operational check symptom, and also the 
worktime associated with making each of the checks in the initial guide which was 


developed according to the requirements of the half-split technique. 


To illus= 


trate the procedure in determining what kinds of improvements can be made in the 
initial guide on the basis of the newly acquired probability and worktime data, 


let us begin with Fig. h. 


A 


Gi 


Ae 


9 


Here the initial guide was set up on the basis of the half-split technique, 


Band it was possible to make perfect splits in terms of number of possible 


troubles so that only four checks (including the verification check) would be 
needed by following a half=-split method. This perfect applicability of the half- 
split need not have characterized the initial guide in order for the present pro- 
posed procedures to be applicable. The checks in Fig. lh are designated by let- 
ters. The check at the top of the pyramid (A) refers to the first check in the 
half-split process following an operational check. The checks at the base of 

the pyramid are checks that would be made if one were simply interested in deter- 
mining whether some particular fault actually existed, or if one were verifying 
the inference which could be drawn from the half-split check in the box immedi- 
ately above. The worktimes associated with each check are entered in the lower 
right-hand corner of the boxes representing the checks. Thus the worktime con- 
nected with making the first check in the half-split process (check A) would re- 
quire two hours, the amount of time connected with making check F would be six 
hours, the amount of time required to make a direct check for fault N would be 
two hours, etc. 


The basic probabilities are those which indicate the relative frequencies 
with which the various troubles are found in the system given the departure from 
standard noted in the superordinate operational check. These probabilities are 
listed in the upper left-hand corners of the checks representing direct or veri- 
fication checks. Thus the probability of trouble H (or the probability of having 
to make check H) is .05, while the probability associated with trouble L is .20, 
and so forth. 


It will be noted that probability values have also been entered for the 
checks involved in the half-split process. These have been derived from the 
probabilities associated with the faults. Thus the probability value for check D 
is the sum of the probabilities for faults H and I (.05 + .10 = .15). These 
probabilities, as well as those in the bottom row of checks, are the probabili- 
ties that a particular check will have to be made. Thus the probability that 
check E will have to be made (assuming the half-split technique is used) is .35, 
and the probability that check A will have to be made (given the symptom which 
showed up in the operational check) is 1.00, since check A would always be made 
when the system data given-by the operational check occurred, 


Having determined the p values and t values for each of the checks in the 
initial guide, the next step is to compute t/p values for each check, The t/p 
values have been entered in Fig. 1 below the respective checks, The next step is 
to identify those situations in which t/p values early in a checking sequence ex- 
ceed one or more of those lower in the sequence. In Fig. 3 such a reversal is 
found in the sequence A, C, F, M. The t/p values for both check C and check F 
are larger than the t/p value for check M. This indicates that it may be desir~ 
able to omit checks C and F, To test this, a new set of sequences of checks has 
been devised to replace the right-hand side of the pyramid of checks presented in 
Fig. 4, and this revised set of checks is presented in Fig. 5. 


Insofar as the properties of equipment to which the half-split can be ap- 
plied permit, this rearrangement has been made so that there will be a continu- 
ously increasing order of t over p as one proceeds through the checking sequence. 
Where the results of check A initially called for going to check C, under the new 
arrangement one would go directly to check M, then check for fault L, and then 
(assuming no trouble has yet been found) go to check G and proceed as one would 
have done in the initial guide. The half-split on worktime-probability solution 


10 


t= work time [_] = cHecks 


p= probability @ SAE 
=» .50 
C 


t/p=5.7 
Sg Sas 


LEE 
p= .10 p=ee5 p= .20 10 
1o)) (doo) roo) 
= 8. t= 4.00 t=10.00 
ee SS es NOE 


t/p=|20 t/p=80 t/p=26.7 t/p=50 t/p=30 t/p=6.7 t/p=20 t/p=60 


Fig.  - Initial guide developed using half-split and assuming equal 
times and probabilities. Time and probability data having 
been determined are now entered for revision process. 


t= work time [__] = CHEcks 


p= probability 2) C) = FAULTS 


«| t/p=2 Ne 
P| 
P| 
J DES, 


‘ EDA 


t/p=120 t/p=80 .t/p=26.7 t/p=50 


t/p=20 t/p=60 


Fig. 5 - Guide revised on the basis of t/p values. 


ll 


takes into account the interrelationships between components and the effects of 
unequal worktimes and probabilities. 


Importance of Operational Checks 


Note that both the procedure for the landing light system and Stolurow's 
procedure for reciprocating engines invoived use of "operational" checks as a 
point of departure, and the "half-split" or probability-worktime principles were 
applied only beyond that point where some departure from standards was noted in 
an operational check. What is the nature of operational checks and why do they 
serve as a useful set of initial checks? 


Operational checks are like other checks in the troubleshooting procedure in 
that they are operations that can be made routine concerned with determining con= 
formance to, or type and/or degree of departure from standards. They differ froin 
other checks primarily in that they ordinarily involve manipulations of controls 
used in actual operation of the equipment and observation of readily available 
feedback information of the type which an operator would note in utilization of 
the equipment. Also, a complete set of operational checks for a particular piece 
of equipment is a means of determining whether the total equipment is operating 
properly. 


More specifically, they provide information as to whether the equipment is 
operating properly (or whether some fault is present), and they permit at least 
a partial isolation of the fault if one is present. As an illustration of these 
features of operational checks, consider again the landing light system. Here 
the operational checks involve a very .simply performed set of operations: moving 
switches and determining whether the landing lights go on and off and whether 
they extend and retract. If one of these standard operational outputs is not 
achieved while others are, one can deduce that the malfunction resides in the 
subset of the total population of possible troubles which underlies that opera- 
tional output which, on this occasion, is not being obtained. 


In summary, operational checks are like other checks in the troubleshooting 
process in that they are sequences of operations that can be made routine and 5 
contribute to isolation of a fault. They are usually relatively easy to perform. 
Perhaps even more important from the standpoint of their value as a point of de=- 
parture in the troubleshooting process, the deviations from standards noted in 
operational checks tend to be those which are noted in actual operation or util- 
ization of the equipment and are the deviations commonly called symptoms. Where 
the symptom information furnished by the operators is sufficiently reliable, or- 
ganization of the eventual troubleshooting directions with the operational checks 
as points of departure will permit optimum utilization of the reported symptom 
information. In situations where the symptom information reported by operators 
is not sufficiently accurate or complete, the checkout procedure provides a 
simple, systematic, and comprehensive scheme for verifying or identifying the 
symptom information. 


It should not be inferred from what has been said that existing operational 
checking procedures are necessarily the most efficient that covld be devised for 
purposes of troubleshooting or even for determining whether a trouble does or 
does not exist in a piece of equipment. At least in some instences it is probe: 
able that more efficient checkout procedures (complete sets of cperaticnal 


12 


checks) could be identified. From the standpoint of troubleshooting, an effec- 
tive checkout procedure should require a minimum of time and other resources, 
determine whether or not the total equipment is working properly, and provide a 
basis for isolating any trouble which may exist, down to some relatively small 
area of the equipment. Insofar as possible, the operational checks would conform 
to the demands of the half=-split process, although with existing equipment it is 
unlikely that any close conformance to it will be possible. 


This is admittedly a considerable simplification of the problem of identi- 
fying optimum operational checks, and research is needed to determine a methodol- 
ogy for selecting operational checks and their sequences. Nevertheless, it is 
believed that existing operational checking procedures are often reasonably ade- 
quate and can form a useful point of departure in troubleshooting activity. If 
this is the case, determination of the behavioral content of a troubleshooting 
guide can be anchored on one end by knowledge of the operational checks and the 
decisions which the operational checks make possible, and on the other end by the 
population of possible faults and their probabilities of occurrence. 


Feasibility of Obtaining Worktime and Probability Data 


In actual practice it is probably true that the most usual situation is that 
in which some knowledge is available by which to gauge relative probabilities of 
the various faults and worktimes associated with the various checks. If actual 
empirical data have not been obtained through maintenance of appropriate records, 
it may be possible to make estimates of the relative probabilities and worktimes 
associated with the various possible troublés on the basis of past experience 
with other similar equipment. -Of course, it is still a mogt question as to the 
extent to which probabilities and worktimes can be accurately estimated before 
the equipment is put into actual use. It would be worthwhile to determine the 
accuracy of such estimates on a sample of various kinds of equipment as a bagis 
for determining whether to devise a troubleshooting procedure using these esti- 
mates. Should such estimates prove accurate, the date of constructing optimally 
useful guides might be moved up to the point where such guides might be available 
when the equipment first comes into operational use. Otherwise, interim guides 
may be necessary before the appropriate data become available. 


It should be pointed out that the use of worktime data is a complex affair, 
since there are dependencies between worktimes. Taking the head off an engine 
involves long worktime; checking the valves also involves long worktime. How- 
ever, the choice between removing the head and checking the valves cannot be made 
on the basis of the absolute time to do each, but must take into account the fact 


that the head must be removed before the valve can be checked. If the probabil-: 


ity of head malfunction were low and of vaive malfunction high, it might still be 
more efficient to check the head before checking the valve as the additional time 
involved in checking the head is small, once the head is removed. If some checks 
are extremely crucial to a troubleshooting guide, and if the guide is present be- 


fore the equipment is finally in production, engineering might reduce the work- 
- time for these particular checks. 


- Selecting Appropriate Solution 


aes 


i ime=-probability in- 
Characteristics of the equipment and the amount of worktime pro 
formation available will determine which of the above solutions to use in a spe=~ 


13 


TABLE II 


Criteria for the Selection of Appropriate 
T = S Solutions 


Model Criteria 


1. Worktime-probability 1. Little relationship between components. 
2. Either or both worktime and probability 
data known. 


2. Half-split 1. Relationship exists between components. 
2. Neither worktime nor probability known. 
3. Or even if worktime and probability are 
known, if criterion is lowest maximum 
number of checks. 


3. Half-split on 1. Relationship exists between components. 
worktime-probability 2. Either or both worktime and probability 
are known. 


cific instance. Table II lists the factors to be considered in making a solution 
on a particular type of solution. These constitute a rough set of rules which 
are not infallible. On the other hand, these rules should, on the average, prove 
to be more efficient than a chance determination of an appropriate solution. 


As was stated earlier in this paper, the proposed solutions are only interim 
solutions. Research for determining more precise solutions is being carried on 
under sponsorship of the Air Force Personnel and Training Research Center. 


BIBLIOGRAPHY 
1. Miller, R. B., Foley, Jr., J. D., and Smith, P. R.: "Systematic Trouble- 
shooting and the Half-Split Technique." Technical Report 53-21, Human Re- 
sources Research Center, Lackland Air Force Base (July 1953). 
2. Stolurow, L. Me, Bergum, B., Hodgson, T., and Silva, J.: "The efficient 


course of action in ‘troubleshooting’ as a joint function of probability 
and cost." Ed. and Psychol. Meas., 15:62:77 (1955). 


14 


COMPUTER METHODS FOR ESTIMATING WEIBULL PARAMETERS IN 
RELIABILITY STUDIES 


John H. K. Kao* 
Cornell University 
Ithaca, New York 


-- In an earlier paper? which appeared in this Transactions, the author 
showed in the appendixes two methods of estimating the Shape and scale parameters 
of a Weibull distribution from a set of life testing data. They are: (I) the 
method of least squares for the transformed data, and (II) the method of maximum 
likelihood for ungrouped data. It was pointed out that since the method of least 
squares was the simpler of the two, it could be used as a first approximation for 
getting the maximm likelihood estimate which involves solving, by trial and 
error, two similtaneous transcendental equations. As a measure of simplifying 
the computation of the analysis, the author suggested fixing the shape param- 
eters at m= 1.7, when studying the reliability of electron tubes. This value of 
m= 1.7 was an average value based upon the life experience data, then available 
to the author, of some 2,000 electron tubes. 


With the wide popularity and availability of electronic computer, the above 
simplifications are no longer necessary, though still desirable for reasons 
explained in the text. This paper describes two additional methods for which the 
computers are almost indispensable. They are: (III) the method of maximum 
likelihood for grouped data,and (IV) the method of minimized chi-squares. For 
the sake of discussion, the two previous methods (I and II) are briefly reviewed. 
As an illustration, the life testing data for five lots of some ),00 electron 
tubes by a large tube manufacturer are treated by all four methods of estimation. 
Comparisons are made on the results and merits of these methods. 


TYPES OF LIFE TESTING DATA 


For economic and other considerations, life tests are usually truncated in 
one of the following two ways: 


Item truncation. In this case the life test is stopped when rth item of the n 
items originally placed on life test fails. Here r is any integer and1l¢ ran. 
For example, we may have 20 items placed on life test initially and we may choose 
to stop the test as soon as we have had 16 failures or 80 per cent of the orig- 
inal number. 


With the item truncation, the life testing data usually consist of the pre- 


cise failure ages of items under test. That is, the observations, xj Gi sh 


With the item truncation, the life testing data usually consist of the precise 
failure ages of items under test. That is, the observations, x; A Lae ee ae 


are such that 0 <x} < ... & Xr <~, where x; is the exact failure age of ith 
item. There are practical situations where this type of data is available. For 


Be Assistant Professor in engineering statistics and quality control, Department of 


Industrial and Engineering Administration, Sibley School of Mechanical Engineer- 
ing, Cornell University. 


15 


example, when life testing antifriction bearings it is often possible to note the 
noise level of a bearing in order to decide whether or not a bearing has failed 
and, hence, the "exact" failure age of items recorded. These single ordered 
observations are called ungrouped life testing data. 


Time truncation. In this case the experimenter wishes to stop the life test 
when a certain time z, has elapsed, regardless of the number of failures which 
have occurred. Here z, is any real number and 0< 2~ <a. For example, we may 
again have 20 items placed on life test initially and we may choose to stop the 
test when an elapsed time of 5,000 hours has been reached, regardless of whether 
or not we have had 16 failures. 


With the time truncation, the life testing data will consist of some con- 
veniently chosen times of inspection, 2; (j = 1, 2, eee; k), and the frequencies, 
fi; which are the number of failures occurred between times 24_} and 23, where 
Zin} < Zje That is, the observations are pairs of numbers z;, f (for’ j= 

> 2,9 eee, kK). For example, with items such as electron tubés, where the failure 
criterion involves a number of electrical and mechanical tests, it is natural to 
record the life testing data in this fashion. These paired ordered observations 
are called grouped life testing data. 


In view of the above discussion we see that the ungrouped life testing data 
are associated with item truncation and grouped life testing data with time 
truncation. However, for purposes of analysis, we sometimes require data of the 
type just opposite to the above convention. This, of course, involves some 
approximations. For example, in life testing electron tubes where data are 
normally in the grouped form, 2;£1, Zofo, «e+, 2f,, we may convert the data into 
the ungrouped form, X], X25 eeey Xry by assumin (a) all f4°items which failed 
between 25.1 and 2; have a failure age of x; = 3 (z4_1 + 25), and (b) the last 
inspection time 2, equal to x,. The first approximation i8 customarily used in 
statistical practice whenever the intragroup information is unavailable. The 
second approximation involves assuming that the last truncated failure occurs 
precisely at x,. Both approximations are not unreasonable if (z; - 24_;) are 
small for all j. On the other hand, if the data are in the ungrouped’ form, in 
order to convert them into grouped form, all that is necessary is to properly 
group them. By so doing, of course, some intragroup information in the data 


would be lost. Again a good practice to follow is to choose the time of trunca- 
tion Z, = Xp, as before. 


These ways of truncating the life testing data also serve the purpose of 
defining some of the notations of life testing data which were used previously?~4 
and are to be used again here, 


PREVIOUS METHODS OF ESTIMATION 


I, The Method of Least Squares on Transformed Data 

Denote by F(x) = 1 = e M5 the Weibull cumlative distribution with shape 
parameter m and scale parameter Xo, where x is the failure age of items in some 
convenient time unit. F(z3) then gives the probability that any item under test 


oe fail on or before time zj- Taking the natural logarithm of F(z5) twice, we 


ds 
in In 2 a 
T= F(z35) In X) + min oie ie 


16 


Using the grouped life testing data of the form: 2s, f5, (Cori iem ely 2. ose) 5 
denote Fs = Es fj, which are the cumlative number of failures occurring on or 


before time ais Let S; be the number of survivals still remaining at time z;. 
Clearly F5 + oj =n for all j. It can be shown that F3/n is an unbiased ene 
variance estimate of F(z5). Hence (n - F;)/n 2 S;/n is an unbiased minimm- 
variance estimate of 1 - F(z5). Sreciatirt sins thisS in Eq. (1), we get 

In In (n/Sj) = -In xo + min z;, j = 1, Cites si Ki, (2) 
which is in terms of the grouped data z;; f,, and may be used for estimating m 
and x9 (through In x,) by the usual method Sf least squares. ) 


Since Eq. (1) is a straight line on log vs. log-log paper, a quick and easy 
way of getting a pair of estimates is to plot the k points given by Eq. (2) on 
log vs. log-log paper. A straight line passing through these points fitted by 
eye gives the y-intercept (in log-log direction) as the estimate of In x, and the 
slope as the estimate of m. 


II. The Method of Maximum Likelihood for Ungrouped Data 
Denote the Weibull density function by 


then the likelihood function for ungrouped life testing data, X19 X09 coos Xpy 
from a sample of size n where rg nis 


! yo m-1 -1 |— m ¥ m 
Le ator ee exp) p= + (oor) : (3) 


By putting the first partial derivatives of ‘In L with respect to m and x, equal 
to zero, we get 


x, = 2 = x," + (n-r) “| th) 


* 
a Sik, oteke eit) ; 
+ in x 
“Ti t<1 i 


(5) 


: #In the work referred to in reference 3 of the Bibliography, this sign was mis- 


* 
4 


printed as a minus sign. 


17 ‘ 


which, if solved simultaneously by trial and error, give the so-called maximum 
likelihood estimates of the parameters m apd) Xo for ungrouped data. The above 
reviews the two methods already discussed.“~ 


NEW METHODS OF ESTIMATION 


III. The Method of Maximum Likelihood for Grouped Data 


For the grouped life testing data of the form 25, - (jimel Oise es 
denote by Pj the probability that any item will fail in the time interval Zj-1 to 
24s then 


pj = F(z3) - F(zj.1) and, in particular, 
Py be F(z) ~ F(O) = F(z); and 


Peey ™ F (©) = Flay) = 1 - F(2y) (6) 
een 


Substituting F(zj) =1-e ~° , we get 


Pkt = 1 - 2. pyze *o , (7) 


The likelihood function is given by Cramer (1, p. 318) as the followi 
inal distribution: ae ) oliowing multinom- 


r) k+1 2 
Noe: a aR (p3)°3 where 
se 
m3 
k+1 k+1 
a f,=n and a pp =i. (8) 


In order to maximize L' with respect to m and Xo it is sufficient to maximize 
ln L as follows: 


f 
L= [| (ps) Je - rina f | 
ages a p3) ne (p,)*3 . (9) 


18 


Substituting Eq. (7) in Eq. (9), we get 


2,0 k k =% j-1" 7%, = 
Lime 204 (n = > f,)° ‘TT |e %o -e Xo f;. (10) 
j=l j=l 
Taking the natural logarithm of Eq. (10), we get 
= 24m k k eB isles at he 
InL=-x— (n->_ f;)+>_ fj InLe 0 -e%o |. (11) 


jel Jae 


In the above, Z5, fj (jae Done s ss.) are known, hence ln L is a surface over 
the m = Xp plane. the pair of m and xo, say m* and Xo*, which maximizes ln L 
(and hence L) is called the maximum likelihood estimate of the Weibull parameters 
for grouped data. In previous works2,3 formas were also indicated for obtain- 
ing the maximum likelihood estimate -- f and X> in the case of grouped data by 
modifying Eqs. (l) and (5). The estimates m* and xo* here by the present method 
will be different from M and X%). But the difference is expected to be small. It 
is difficult to differentiate either Eq. (10) or Eq. (11) with respect to m and 
Xo, hence the trial and error method mst be used to maximize them. With an 
electronic computer such as IBM 650, either one of these equations may be pro- 
grammed for maximization with respect to m and x, in order to obtain m* and xox, 


IV. The Method of Minimized Chi-Squares for Grouped Data 


Following the definition of pj given by Eq. (6), np; (for j = 1, 2, «.., k+l) 
will be the expected number of failures between inspection time 2j-1 and zj. If 
ps; were known from other considerations, then, for large n, provided that npj > 5 
and k+1 > 5, the following quantity has approximately a chi-square distribution 
k degrees of freedom (5, p. 167): 


k+l (f; - np;)@ k+l ¢.2 
22> — anu 12 
xX 25 Bj ee ny n (12) 


The last member of Eq. (12) is obtained by the relationship: 
k+1 k+1 
> pj = and > fj;=n. 


j=l jl 


However if ps are unknown and their values depend on the paremeters of F(x) to be 


estimated from the data, Eq. (12) still has approximately a chi-square distribu- 


/ 


tion, provided that the unknown parameters are replaced by their maximum likeli- 
hood estimates and that the degrees of freedom are reduced by one unit for each 
parameter estimated (5, p. 170). The fact that Eq. (12) remains to be approxi- 


mately a chi-square variable, regardless of whether or not pj are known, indicates 


that it is completely independent of the form of the underlying mortality distri- 


bution F(x) assumed. For this reason Eq. (12) may be used as a criterion for 


19 


judging the goodness of fit between the data and the estimated distributions 
(1, p. 425). Of course, the smaller the A‘-value the better the fit. A large 
)2-value may be thought of as poor fit and an ordinary chi-square table may be 
used to reject the goodness of fit at some preassigned level of significance. 


As a matter of fact, if the form of a Weibull distribution is assumed as 
the general failure-age distribution, Eq. (12) provides a further method of esti- 
mating the Weibull parameters. The method consists of minimizing Eq. (12) with 


k+1 
respect to m and x,.- Of course, we may alternatively minimize : 7 £3°/p5 which 


for a Weibull distribution is 


k zn =3j-lp @ieeie 
(n- > f,) e XO + f;/|e Xo -e *o : (13) 
jal 9 jal 9 


In the above, z3, fy (j = 1, 25 oe, k) and n are known life testing data and the 
parameter m and Xo are to be estimated. The pairs: m and Xo which minimizes Eq. 
(13) is called the minimized ,° estimate of the Weibull parameters for grouped 
data. Again an electronic computer is indispensable for the minimization of 

Eq. (13) which is too tedious otherwise. 


COMMENTS ON ALL FOUR METHODS OF ESTIMATION 


Method I, employing the least squares on the transformed data, has little 
theoretical justification. This is because of the fact that the least squares 
approach there does not necessarily guarantee the "best" fit of the raw data in 
the Cartesian scale. The practical value of this method lies in the fact that it 
is simple. That is, if either transformed data or log vs. log-log paper is used, 
an approximate estimate of Weibull parameters may be obtained by simply: plotting 
the data and estimating them by eye. This graphical solution is, in general, not 
too a off from the theoretically better estimates provided by Methods II, III, 
and ° 


Method II is the best among the four, if the data are naturally ungrouped 
(e.g., ball-bearing life testing). This method utilizes the precise failure age 
of each item failed in providing the estimates. However, this method would also 
give a reasonably good estimate with data in the grouped form (e.g., electron 
tube testing), provided the grouped raw data is such that the number of failures 
per inspection period is small. In programming this method on an electronic 
computer, Newton's approximation may be used for solving the two simltaneous 
transcendental equations. This procedure converges very rapidly, especially if 
eg tee solution obtained by Method I is fed into the computer as the ini- 
al trial. 


Method III is the best if the life testing data are originally obtained in 
grouped form and for some reason or other the group sizes are large; i.e., the 
inspection periods are long so that the number of failures per inspection period 
are not small. We have actually encountered such a case in analyzing a large 
amount of life testing data of electron tubes under the U. S. Army Signal Corps 
Contract No. DA36-039=-sc-252h, conducted at GE in Owensboro, Kentucky. When 


20 


(7 


this is the case, Method III is preferable because Method II will be relatively 
poor due to severe approximations. Actually, Methods II and III are theoreti- 
cally equivalent; they both provide the maximum likelihood estimates of the 
Weibull parameters. Because of the difficulty in Method III for an analytical 
solution, it will take more computer time than Method II. The computer program 
for Method III will generally consist of a series of trials in the parameter 
space of desired accuracy which converges rather slowly even when a two-stage 
procedure of coarse and fine mesh is chosen. 


Although Method IV is the best from the standpoint of A2-criterion for good- 
ness of fit, it places a restriction on the data by requiring np; > 5 and ktl1 > 5 
and in addition it lacks some of the desirable properties of maximum likelihood 
estimates (e.g., invariance, sufficiency, efficiency). As in the case of Method 
Tit, it is also only appropriate for the grouped life testing data. With some 
modifications shown by Cramér (1, p. 26) it can be made theoretically equivalent 
to Method III (a modified minimized a2 estimate shown by Cramér is identical to 
a maximum likelihood estimate). The computer program for Method IV is essentially 
the same as that of Method III; i.e., optimization in the selected parameter 
space. Both programs can be shortened considerably in the number of iterations 
if the graphical solution by Method I is fed into the computer as the initial 
trial. 


AN EXAMPLE 


Table I shows the grouped life testing data of five lots of electron tubes 
from a large tube manufacturer from an early report.4 Because of the limitations 
in the minimized A method and because of our desire to compare these four 
methods of estimation, the data are regrouped so that k+l >5 and hence npj 2 5. 
Table II shows the estimate of Weibull parameters by the four methods. The good- 
ness of fits by the 2-criterion are in all cases excellent, as indicated by the 
computed probability P's (by /curvilinear interpolation) which are to be inter- 
preted as follows. Implicitly we have conducted a series of statistical tests 


TABLE I 


Life Testing Data : 
(number of failures f; between inspection times 
25-1 and 25 for five tube lots) 


BD TOe Totals 

Inspec- 

ae Inspection Time, z Survivals ce f 
Tube Periods, n : 


Lots k sito omeT. p gom 10.) Beetle) jal? 


’ 


10 Fae pam Bl ts aie“ Bs O28 33 7 89 
2 9 Bi cee 16 oil, 9057. tind 70 125 
3 8 pitta ey Ouew sO) ieee 3 33 63 
4 4 Peag eee I. Oh Suc AS Roy = 25 58 
5 4 Sy Seo acer Cs eC Clg hen at 6 32 


TABLE II 


Estimate of Weibull Parameters and Computed Chi-Squares by 
Various Methods of Estimation 


1.5611.7h| 21.3 |6.35|.61/1.7h| 21.1 |6.20).63/1.7h) 21.106) 6.19 


3.12] .87|/1.85/119.75 | 3.09 
1.94}1.39| 37.875]1.67 


of hypothesis that the tube lots are samples taken from the respective Weibull 
distributions with parameters as estimated here. Using a significance level of 
5 per cent we accept the hypothesis if P > 0.5. The degrees of freedom are 

(k - 2) since two parameters are estimated. 


Note the improvement of goodness of fit as indicated by the increasing values 
of P's. Clearly Methods III and IV are more refined in this respect. 


ACKNOWLEDGMENT 


The author wishes to thank Messrs. William Y. Stevens and Birger Lovgren for 
programming these methods on an IBM 650 magnetic drum computer and Messrs. John 
D. Berry and Birger Lovgren for computation at the Cornell Computing Center. The 
work is done under U. S. Army Signal Corps Contract DA36-039-sc-6)646, under the 
directorship of Prof. G. C. Dalman. 


BIBLIOGRAPHY 


1. Cramér, H: "Mathematical Methods of Statistics." Princeton, Princeton 
University Press, 1951. 


2. Kao, H. K.: "Quantifying the Life-Quality of Electron Tubes with the Weibull 
Distribution." Technical Report No. 26, School of Electrical Engineering, 
Cornell University, Ithaca, N. Y. (Nov. 1955). 


3. Kao, H. Ke: "A new life-quality measure for electron tubes." Trans. IRE, 
PGRQC-7, pp. 1-11 (Apr. 1956). aie 


h. Kao, H. K.: "The Weibull Distribution in Reliability Studies of Electron 
Tubes." Technical Report No. 33, Research Report EE 343, School of Elec- 
trical Engineering, Cornell University, Ithaca, N. Y. (Sept. 1957). 


5. Hoel, P. G.: "Introduction to Mathematical Statistics," 2nd ed. N Yo k 
John Wiley and Sons, 195). : a badge es 


22 


EFFECTS OF AMBIENT TEMPERATURE ON ELECTRON TUBES 


K. Hopkinson 
Ministry of Supply 
Royal Radar Establishment 
Great Malvern, Worcs., England 


During an investigation into the effects of high temperature and high altitude 
operation on the life of electron tubes I had occasion to refer to an article 
which was published in the 1956 IRE National Convention Record, Part 6. This ar- 
ticle formed part of a section headed asic Study of the Effects of Operating 
and Environmental Factors on Electron Tubes" and was called "The Effects of Am- 
bient Temperature" by Paul F. Barnett. 


Mr. Barnett gives data on the survival rates of five types of electron 
tubes, namely 6005, 6J6W, 565), 5726, and 5670. Lots of 200 valves were as- 
sembled for each test with all approved manufacturers! products being represent- 
ed. Life tests were carried out under JAN specification conditions at various 
ambient temperatures, and measurements of characteristics were made at the life 
test temperature. The following tables of results have been compiled from the 
survival rate curves given in the article and there may be some inaccuracies in 
reading from the published curves. 


TABLE I 
Ambient Bulb Survival Per Cent 
Lot Temp. Temp. ero ’ 
No. (°C ) (%%) Hours Hours Hours Hours 
Tube + 670 1 Room 100 98 95 95 90 
eka all 2 100 115 95 95 95 88 
3 175 186 93 93 93 93 
h 250 261 95 77 7h 72 
6J6W 1: Room 10 98 98 98 90 
rans 2 100 13h 98 95 93 90 
3 175 201 98 95 90 85 
hy 250 270 96 80 68 ho 
5 300 311 85 50 Ke) Hes 
6AK zk Room 100 100 99 96 90 
TIA acaba 2 100 125 100 49 96 90 
3 175 192 100 88 8h 71 
h 250 263 100 58 55 32 
5 300 312 55 25 25 ae 


Tube type 5726/6AL5W (operated 
as full wave rectifier) 


We wn 
We 
VL 
1 
© 
= 
=) 
(e} 
oO 
©o 
WL 
Co 
Wr 
(es) 
Wr 


Tube type 6005/6AQ5 an Room 220 100 97 9%, 97 
2 100 237, 100 95 90 80 
3 175 261 100 65 SO 3y 
e250 316 90 30 15 7 
cy ae) 347 20 5 0 -- 


The above results have been used to derive curves of percentage survivals 
against bulb temperature at 250 hours, 500 hours, and 1,000 hours. These curves 
are given in Figs. 1, 2, and 3 and show quite remarkable changes in survival rate 


100 


90 


80 


70 : ee Spore a ee i } 


60 


50 


SURVIVALS % 


40 


6005 5 
- 005 /6 Aas 


Pees Wie | 


5654/6 aks 


{ 

| 

| 

| 

i 

| 
Ea ee 


0 100 200 300 400 


BULB TEMPERATURE ae 
Fig. 1 - Percentage survival vs. bulb temperature after 250 hours life test. 


24 


p 


100 }——--—-— 


90 


80 


70 


fo) 
oS 


50 


SURVIVALS % 


40 
30 


20 


6005 /6AQ5 


100 200 300 400 
BULB TEMPERATURE °C 
Fig. 2 - Percentage survival vs. bulb temperature after 500 hours life test. 
when bulb temperatures exceeding 200°C are recorded. The 6005, which is a "hot! 


tube, is rather better than the other tubes at 200°C but is significantly worse 
at temperatures exceeding 250°C. 


The curves have been "smoothed" to give a reasonably good fit and a further 
table of results is given below by recording the average per cent survival for 


25 


100 


90 


80} 


70 


60 


50 


40 


SURVIVALS 70 


30 


20 


0 


Fig. 


esas eee eles ee 


100 


2 


BULB TEMPERATURE 


200 


ee 
we” 


| c 
300 400 


3 - Percentage survival vs. bulb temperature after 1,000 hours life test. 


all the curves at various bulb temperatures and for the 250, 500, and 1,000 hour 
life test points. Thus: 


Bulb 

Temp. a 

(°C Hours 
100 97.5 
150 aT 
200 oh 
250 81.5 
300 46 


Average Survival Per Cent 


a 


Hours 


a ’ 
Hours 


These results have been used to derive the curves given in Fig. , and it is 
quite significant that when the results of the original measurements are ex- 
pressed in this form it is possible to draw straight lines through the recorded 
points. These are average results for all the five tube types under test, and 
similar curves drawn for each individual tube type will give different patterns 
of curves. However, the "average" presentation given in Fig. ) leaves no doubt 
that the operation of tubes at bulb temperatures in excess of 200°C will inev- 
itably cause high failure rates. 


100 


90 


80 


SURVIVALS % 


40} 


0 250 500 750 1000 HOURS 


DURATION OF LIFE TEST. 


Fig.  - Average life test survivals vs. bulb temperature. 


27 


It has not been possible to compile the results for life test hours below 
250 hours, so that it is not known whether the linear relationships would be 
maintained from zero hours upwards. If this condition does exist, it would mean 
that 10 per cent of new tubes would fail on insertion into equipments which cause 
bulb temperatures of 250°C to be developed. Further, it would be expected that 
about 40 per cent of tubes would fail on insertion into equipments generating 
300°C, Such temperatures are not uncommon. 


The above analysis was made in order to try to find possible causes for in- 
sertion failures and early life failures in certain equipments which are higher 
than would be confidently expected, having regard for the known performance of 
"reliable" valves under specification and life test conditions. These failure 
rates are not unduly high but are distinctly troublesome, and it would appear 
that as a result of Mr. Barnett's valuable work a possible lead has been given 
which may help us to solve this problem. There would appear to be at least three 
possible lines of action. 


1. Reduce the bulb temperature of all valves below 200°C by various heat con- 
ducting shields. This may not be possible if high ambient temperatures 
are being generated. 


2, Test all valves for use in "hot" equipments at elevated bulb temperatures 
and so remove the initial failures. 


3. Develop a short range of high temperature valves for use in equipnents 


which cannot be cooled or otherwise treated in some manner to reduce bulb 
temperatures. 


28 


TOMORROW'S QUALITY DEMANDS 


Henry 0. Ims 
East Lynn, Massachusetts 


Looking ahead has always been important to the dynamics and growth of American 
industry. We are all familiar with the early industrialists, famous for their 
conviction, that the American public would accept new ideas and innovations. Such 
men as Henry Ford and Peter Cooper who looked ahead and guessed right will always 
be remembered for their contributions. 


An accurate estimate of the future is important because it gives us a goal 
toward which to build. It tells us in which direction to travel. Effort spent 
in preparing for the future may not yield immediate results or profits, but it 
will give a better competitive position in times to come. In fact, with the 
present rapid rate of technological advancement, it may well determine whether or 
not we remain in business. 


Only recently has quality control been considered a separate important 
function in industry, Although there have always been work checkers on the pro- 
duction line, having no independent authority, they became subject to pressures 
from production schedules or shop politics. A superintendent, for example, to 
keep peace might pass on work rejected by the work checker. The foreman, tired 
of having his work rejected would say, "It may not exactly meet all the drawing 
specifications, but it will work all right. I know -- I've made them for ten 
years." 


Managers began to realize that quality could not be compromised or bargained 
away when a close correlation between defect preventive programs and appraisal 
and failure costs became evident. Quality control concepts began to spread into 
engineering, manufacturing methods and equipment planning. As the importance of 
quality control grew as a "trans-function" activity, its accountability and 
authority became consolidated within one recognized function -- quality control. 


INDUSTRIAL PATTERNS 


To gain an insight into quality demands of the future we will investigate 
the intimate relationship between quality control and the product and its manu- 
facture. 


First, however, there are several outstanding trends in the evolution 
of American products and their manufacture that should be examined. Automation 
in particular has gained widespread recognition. Since mech has been written on 
the technical and social aspects of automation, it will not be considered in 
detail. It should be mentioned that the ultimate goal of automation in manufac- 
turing is the complete mechanization and integration of material handling, 
feeding, controlling, and actual production, as well as inspection and test. 


An automated production line then becomes more like a system instead of 
individual machines separated by human beings. If even one operation breaks down 
this will have on effect that may be felt by the entire system, depending on how 
closely the system is integrated and what provisions are made for breakdowns. 


29 


A fewer number of persons per unit output will be necessary to run an auto- 
mated factory. The operatives will necessarily be upgraded. They will form a 
flexible team, ready to handle various factory problems as they arise. Setup, pre- 
ventative maintenance, and breakdowns will comprise most of their activities. A 
continuous flow of information will come from all parts of the factory to the 
central control station. Here central computers will make decisions and process 
and record information. When a problem arises beyond the capabilities of the 
computers, an alarm will sound, lights will indicate the location of the trouble, 
and a team of maintenance and facility engineers will be dispatched to the area. 
In such a factory, all the planning and designing must be done prior to the pur- 
chase and installation of expensive equipment. Never under any system before was 
the motto "do it right the first time" ever so true. 


Automation does not lay claim to the complete future of manufacturing. 
There will always be those products custom built to individual customer specifi- 
cations. All products to be mass-produced at some time mst be developed on a 
small volume. There will always be a demand for special items where only one of 
its kind is built for a particular application. 


Increasing complexity and miniaturization is another decided trend today, 
especially in the defense and original equipment manufacturing industries. Basic 
scientific advancements in solid state physics, metallurgy, chemistry, and other 
fields have allowed a substantial decrease in the size of basic components and an 
increase in the efficiency of many materials. These are used in turn to manufac- 
ture more complex devices. At the same time a strong effort is made to keep the 
size of the over-all equipment to a minimum. The fantastic reduction in size of 
the basic electronic amplifier component, from an electronic tube to a kernel 
transistor, represents a ratio of one hundred to one. Many other electronic 
parts and equipment have undergone a similar transition. Packaging and methods 
of assembly are making significant contributions to miniaturization. Merely by 
redesigning the location of the same components and the use of new assembly tech- 
niques, such as printed circuit boards, the size of many standard electronic 
devices have been substantially reduced. 


A third trend in modern industry is precision. Precision is a prerequisite 
for complexity. A simple one-piece product does not need precision. However, as 
soon as it depends on another part to function, tolerances must be specified for 
both parts. This tolerance tightens as the number of components increases. Tens 
of thousandths, microinches, and one hundredths of one per cent are measures 
increasingly used in industry. 


The fourth trend is speed. People are always in a hurry nowadays and the 
drafting board to shipping time is being squeezed hard. Schedules and dates are 
paramount because keen competition among manufactured products exists not only 
in terms of price, but also in terms of time. Time is worth money. 


EFFECT ON QUALITY CONTROL 
Automation 
Test and inspection equipment will be mechanized and mechanically integrated 
into the production system. This is part of the ultimate over-all automation 
philosophy as discussed previously. The test and inspection equipment of today 


30 


is almost completely manual or only partially mechanized, comparatively far behind 
production equipment.1 There is, however, equipment either on the drawing board 
or in the market that is fully mechanized. Several instrument companies are 
marketing air gauging systems that can measure dimensions of parts in motion. 
Another development is an electrical testing board. The electronic assembly to 
be tested is plugged into the board, test voltages are programmed into the assem- 
bly, and the resulting measurements are recorded out. Measurements falling out- 
side tolerances are recorded for analysis and immediate action. 


Statistical analysis will be built into the automated inspection equipment. 
Information obtained by the inspection equipment will be fed back to the machine 
computer. The computer controls the position of the tool, compensating for tool 
wear and machine drift. If the normal curve of the output dimensions has merely 
shifted, then changing the tool position will probably remedy the situation. If 
the normal curve of the output has instead spread over the limits on both sides, 
then it probably will be necessary to shut the machine down for maintenance. 


Post-production inspection is common today. Automation will emphasize in- 
production and pre-production inspection. 2 In-production inspection will provide 
quality information while the part is actually being produced. Although it is 
impossible to measure the output the very same instant it is being produced, in- 
production inspection in: practice will occur so close in time that corrective 
action will take place as the variable approaches the limit. High-production 
automated equipment will demand high-quality incoming material. A faulty part 
that won't fit a hand-made assembly can be easily discarded by the operator. A 
faulty part that won't fit an automatically assembled unit will probably jam the 
machine. Down-time on a highly productive machine is expensive. The elimination 
of any possible defective machine-jamming material will depend on a tight quality 
control assurance program. There are two ways to handle this problem: one is to 
assure the output of previous and contributing equipment and incoming inspection; 
the other is to automatically inspect just prior to entrance into the machine 
pre-production inspection. 


One hundred per cent inspection will become more widespread. In many cases 
automated equipment to inspect output 100 per cent will cost only a little more 
than equipment to inspect 10 per cent. This leads directly into selective assem- 
bly. Instead of investing in a large capital equipment to hold down tight toler- 
ances, it might well be more economical to sort the pieces as they are being 100 
per cent inspected. When parts are made faster than they can be tested or in- 
spected, a sampling plan would be considered. 


The most critical quality control activity will take place prior to actual 
production. Decisions will be made at that time to fix the level of quality. 
New design control will be a mst in the automated factory as automatic equipment 
tends not to be flexible. And, also, the investment for equipment will be so 
great that the cost of making changes in certain design areas will be prohibitive. 
Many changes which are being made immediately prior to and during production mst 
_. now be made prior to final design. 


Quality-mindedness is applied most where the human factor predominates. In 
the automated production line the human factor is minimized, and also the impor- 
tance of quality mindedness. Therefore quality-mindedness mist be emphasized 

_ with engineering and planning personnel. The personnel dealing with automation 


31 


will of necessity have higher qualifications than today's hand assembly factory 
labor. The operatives will be of a higher skill level. There will be few jobs 
requiring little skill. Factory labor will be upgraded. Inspectors, as such, 
just won't exist and in their place will be highly skilled inspection-test equip- 
ment setup and maintenance personnel. 


Gauge control will play an important role in the automated factory. There 
mst be a continual program to insure that all automated measuring instruments 
are accurately calibrated. In some instances a particularly critical measuring 
instrument might have a second back-up instrument to periodically overcheck the 
original instrument. 


Complexity and Miniaturization 


As the consumer and military demand more and better performance in a small, 
compact volume, the complexity of the product and its manufacture increases. 
Along with increase in complexity goes a growing need for high reliability. Test 
and inspection equipment becomes more intricate, and there has to be more of it. 


Test and inspection planning will be more elaborate. The entire manufacturing 
organization will seek to become efficient and effective in handling a complex 
product. The jigs, fixtures, and other equipment used in inspection and test of 
the more complicated assemblies will necessarily be of a higher level that it is 
now. The same holds true for the personnel, who mst not only be familiar with 
the intricate inspection and test devices, but also with the product they are 
testing. 


This leads us right into the importance of quality mindedness in a complex 
and miniaturized product. Although a simple product with a 95 per cent relia- 
bility figure might well suffice, when five simple products go together and have 
to function as one unit, the complex unit is less than 80 per cent reliable.* 


The quality level of all the component parts mst be increased to a certain 
level, depending on the application. This will reflect back onto the standards 
of the quality control organization and to the entire manufacturing and engineer- 
ing organization. Reliability mst be designed into the product, as well as 
built into the products; quality mindedness mst be strongly emphasized in manu- 
facturing and engineering; and component reliability mst be maintained at an 
extremely high level. 


There are many more pitfalls in something complex than in something simple. 
The people working with complexity must be on constant guard and supported with a 
program of quality mindedness. The quality control methods will be elaborate. 
The detail of the entire complex mist be laid down on paper. Qualified quality 
control planning and methods men will be breaking out test and inspection work 
elements, simplifying the operational testing, and inspecting as mich as possible. 
Test and inspection equipment engineers will back up quality control activities 
with high caliber instruments and equipment. For the more involved equipment a 
special test and inspection maintenance detail will be necessary. 


*Assuming all five components are essential to the operation of the composite 
product and that each individually is 95 per cent reliable, the product rule 
tells us that the composite reliability is 0.95, or less than 80 per cent. 


32 


Precision 


With the advent of products demanding and machines capable of producing 
parts to tolerances in the tens of thousandths and microinches,* the quality con- 
trol function mst make adequate provision to match these tolerances. Quality 
control has already made good use of optics in the inspection function. Other 
physical phenomena are being applied to inspection and test. Quality mindedness 
will play an important part in the manufacture of precision parts. Extreme care 
will be called for on the part of the operatives to set up and keep the machines 
in control. A strong gauge control plan will be necessary to maintain the proper 
Standards. Perhaps the factory gauge control plan will be tied into some indus- 
trial or national gauge standards system. Cleanliness will be doubly important 
in the manufacturing andj inspection and test areas. A particle of dust can 
change a physical dimension. 


Skilled personnel will be needed in all phases of precision manufacturing, 
including test and inspection. These persons may not be skilled in the handling 
of complex product inspection or test (as discussed in the previous section), but 
they will be required to develop a feeling for small tolerances. They mst 
handle equipment with care and delicacy. 


Cycle Time 


Even with a tremendous increase in complexity and precision in the products 
of American industry, our customers are further demanding that we substantially 
reduce the production cycle time. In an effort to meet this demand the manufac- 
turing and engineering functions have overhauled and quickened their activities. 
This includes quality control. 


New design control now plays a critical function. Not only mst it ferret 
out those problems that might bottle up production of the product, but it mst do 
this in record time. New design control will not be something that a firm would 
like to do half-heartedly in hopes of picking off the surplus or obvious savings. 
Instead, it will have a specific, serious responsibility. The persons performing 
this function can not be dillettantes, but rather experts in the design of this 
product and the manufacturing capabilities and processes available. 


With the time squeeze on all functions, it is well that all quality control 
objectives, methods, and responsibilities be recognized and integrated well into 
the pattern of a fast operation. Under the pressure of shortened delivery cycles 
we have no time for mistakes -- it mst be done right the first time. 


DEVELOPMENT AND JOB SHOP 


In a development and job shop it is important to realize the fact that there 
exists less "simplified" work that has undergone the analysis of a methods man. 
As such, each person mst have a fuller understanding of the product and the 
processes involved. With the type of products that are foreseen, we can surmise 
that it will be of the utmost importance that quality control personnel be of a 
high caliber, with a desire to build a product that works. 


The Inchworm Motor, manufactured by the Airborne Instruments Laboratory, 
Mineola, Long Island, New York, controls certain metalworking machines to hold 
dimensions to plus or minus five microinches. 


33 


Methods and sampling plans designed to efficiently handle a large quantity 
of similar or identical parts are useful in this type of operation only to the 
extent that large quantities of similar or identical parts are actually manu- 
factured. General approaches and techniques will be developed to handle the var- 
ious types of parts and assemblies which together form the job shop product. 


HOW WILL WE ARRIVE? 


Progress in industry is generally of an evolutionary nature. This however 
does not mean that growth takes place without a good deal of planning and leader- 
ship. As the trends of automation, complexity, precision, and short delivery 
continue to grow, farsighted managers mst take positive measures to insure com- 
petitiveness through minimal quality costs. In particular, advancement will be 
made along the lines of quality control equipment, techniques, and methods and 
personnel. 


The large firm is advised to have a group of people devoted to future 
quality control. These persons might carry on industrial quality control in new 
equipment evaluation and application. If they intended to actually build inspec- 
tion equipment they might investigate various physical phenomena, many of which 
are being applied to modern inspection and test devices. Radioactivity, air 
pressure, light, magnetics, ultraviolet rays, electronics, chemical techniques, 
spectrography, X-rays, ultrasonics, interference, and diffraction are but samples 
of physical phenomena that can be applied to quality control technology. Another 
activity of advanced quality control will be in the area of new statistical 
techniques and tools. As A. V. Feigenbaum says, "For too many years we operated 
by warming over the basic work of Dr. Shewhart and others."2 New methodology 
will be required to effect the optimum use of new equipment. Note carefully that 
"optimm" means complete automation in some applications and virtually no auto- 
mation or mechanization in other applications. 


Although the smaller firm may devote less manpower to this area, it mst 
keep pace with the total industry. As a vendor to a larger firm it will cer-’ 
tainly become enmeshed in the quality control philosophy and practices of the 
larger company. As the field of quality control develops into an integrated 
activity, the flow of information within itself will tend to transfer new ideas 
from one firm to the next, and from one industry to another. 


TOTAL QUALITY CERTAINTY AT MINIMUM COST 


Quality certainty for a minimm cost at the customer level is the responsi- 
bility and objective of quality control. "Quality certainty" refers to the 
quality control activities that will cause the entire plant to produce a quality 
product. "At the customer level" indicates that quality must be at the level 
desired by the customer. Excess quality, not desired or paid for by the cus- 
tomer, is uneconomical. Substandard quality is also, needless to say, unprofit- 
able. "For a minimum cost" requires quality control to contribute to profit 
through minimizing quality costs. 


We have seen a preview of the quality control of the future. Automation, 
complexity, and the other rapidly developing industrial trends will challenge 
quality control to develop new techniques, new personnel, and new equipment. It 
is noteworthy that this challenge is not limited to the industries primarily 
affected by. these trends. The concept of reliability, as related to increasing 


— 


34 


complexity, demands an exceedingly more dependable product than the old-fashioned 
component parts manufacture, 


BIBLIOGRAPHY 
"Automation-Continuous Automatic Production." General Electric GEA-6),05. 


Feigenbaum, A. V.: "The new approach to quality control." Factory Manage- 
ment and Maintenance (Mar. 1957), 


"The Inchworm Motor for Centerless Grinder Control." Airborne Instruments 
Laboratory, Inc., Mineola, New York. 


"Automation and Process Control." Section V, Quality Mammal, General Elec- 
tric Manufacturing Services, New York, New York. 


"Mechanization and Automation in Test," Section II, Quality Manual, General 
Electric Manufacturing Services, New York, New York. 


Aller, W. F.: "Built-in Quality Control for Metalworking." The Sheffield 
Corporation, Dayton, Ohio. 


Aller, W. F.: "Measure your quality automatically." Automation (Mar. 1957). 


Bercow and Levy: "Comparative Study of Industrial Quality Control Programs." 
Massachusetts Institute of Technology, MS B&E Thesis, 1953. 


35 


PROGRESS IN TV-RECEIVER RELIABILITY 


Ee He Boden 
Sylvania Electric Products, Inc. 
Radio Tube Division 
Emporium, Pennsylvania 


In just over ten years, television has grown from an oddity to one of the most 
common furnishings in the American home. The average householder regards his TV 
set as an essential appliance almost on a par with his telephone or the pop-up 
toaster on his breakfast table. We of the electronics industry, associated with 
the problems of producing this item of standard living room furniture, know that 
the behind-the-scenes problems have been far from commonplace. Since the early 
days set reliability, as a dependent function of tube and other component relia-~ 
bility, has been one of the most troublesome problems besetting us. Yet many of 
us tend to forget the steady growth in component reliability which has taken 
place in these few years. 


Since 1950 the Radio Tube Division of Sylvania Electric Products at Emporium 
has been conducting life performance tests on Sylvania tubes in various makes and 
models of television receivers. The year-to-year operation of this life test 
program has proved most valuable in the maintenance and improvement of tube life 
performance, Also, valuable information has been supplied to the Radio Tube Div= 
ision's Application Engineering and Field Engineering Departments concerning the 
performance of Sylvania tubes in sets of various manufacturers. 


During the first few years of the program, various test conditions and pro- 
cedures were studied to find that combination which would provide the maximum in- 
formation in a minimum of time. The test conditions, procedures, and data hand- 
ling methods described here have proved to be the best for the information 
desired, There has now been sufficient data collected to reveal trends in tube 
life performance, and an attempt has been made to draw from the results some con- 
clusions eoncerning both tubes and television receivers. This paper is a report 
on life test data collected in the last three years resulting from the testing of 
15,089 tubes in 1.),; million set-run hours. All data was tested for statistical 
significance at the .05 level, which means that there is only one chance in twen- 
ty that the observed difference does not reflect a true difference. 


TEST CONDITIONS AND PROCEDURES 


Figure 1 shows a portion of the area being used for set life-testing tubes. 
Here the sets are run for a period of 1,500 hours, during which time the sets are 
automatically cycled on 50 minutes and off 10 minutes of each hour, with two ad- 
ditional manual cycles of one hour off during each 2); hour period. Fifteen hun- 
dred hours approximates one year of operation. Brightness and contrast controls 
are adjusted for normal viewing. 


A line voltage of 130 volts was selected to produce an accelerated life con- 
dition, The degree of acceleration using 130 volts had to be determined from ex- 
perimental data. For this purpose a representative group of receivers was se-=- 
lected and run for 1,500 hours. Half of the receivers were operated at 130 volts 
line and half of them were operated at 117 volts line. At the completion of the 


36 


| 


Fig. 1 ~ TV life-test area. 


run, sets operating at 130 volts had 2.) times as many failures as the sets oper- 
ating at 117 volts. This would seem to say that one year at 130 volts was ap= 
proximately equivalent to 2. years at 117 volts. This same test is repeated 
each year with current models of receivers to determine if any change might have 
occurred in this acceleration constant. As yet there has been no significant 


difference in this constant. 


From time to time, television receivers are obtained in groups of ten to 
twenty sets of each model. One hundred eighty to 250 sets are under test at all 
times, with upwards of eleven set manufacturers being represented. Each group of 

receivers is first run 1,500 hours as complemented when received. At the com- 
pletion of the first run, the receivers are completely retubed and then run for 
another 1,500 hour period. This is repeated until the sets are replaced by newer 
models. Each group of sets is used for as few as four rans or as many as seven 


runs. 


| When a tube fails during a test run, it is removed from the receiver and a 
replacement is inserted to serve the set for the remainder of the run. Failure 


37 


of the replacement tube is not included in the data but is noted to detect the 
existence of a critical application in a particular socket. Although important 
picture tube information and other circuit component data have been obtained, 
this paper covers only receiving tubes, 


All tube failures are carefully studied to determine if any other component 
failure might have caused the tube to fail. The tubes are visually and elec- 
trically analyzed to determine the cause of the failure. A failure is regarded 
as that level of performance that would cause a set owner to call a serviceman. 
Records of all tube failures are made on standard McBee Keysort punch cards 
(KS371N). A sample card is shown in Fig. 2. 


By pre-established numerical and direct coding, information concerning 
failure cause, location, and time is punched on the card. In the same manner, 
other information identifying the various kinds of receivers is stored. By a 
simple process called needling, cards representing tube failures are sorted and 
classified to provide the data desired. In this way expected set survival is 
calculated and failures are grouped according to causes and location. 


Expected set survival is calculated from the tube failures and not from ac- 
tual set failures. In this way maximum use is made of the acquired data and a 
more accurate picture of set survival on the basis of tube failures is obtained. 
The method is best explained by an example. Consider a hypothetical case in- 
volving ten sets, where at the end of 200 hours two sets had a horizontal- 
amplifier tube failure. In this case the calculated set survival would be 80 per 
cent, or 80 sets out of 100. Now consider the event where at the end of 200 
hours one set had a failure of the horizontal-amplifier tube and another set had 
a damper tube failure. Here the probability that the two events could have oc- 
curred in the same set must be considered, and so the expected set survival is 
9/10 x 9/10 which equals 81 per cent, or 81 sets out of 100. In the same way, 
if both the horizontal-amplifier tube and the damper tube had failed in the same 
set at the end of the 200 hour period, the probability that one of the two fail- 
ures could have occurred in one of the other nine receivers must be considered. 
Therefore, the expected set survival is still 81 per cent. 


DATE SET RECEIVED 
NO. OF SETS IN RUN. 


O01 cot x sanow [a dosxt 
be | zz | 


NOILY: 
pzi_j cs tvs [st [on Ces Teor [os | oz | iz] zz [ez [oz [se | 


Fig. 2 = Card used to record data, 


38 


100 


Ea 

a 

Seaeeee sa 

Jee ee eee 

mei eee |e) |e kl 

ie ees Es ff = eae |e a 
2 si 


fe) P2053) 24°-5 6 7 8 691041 s12 21371415 


HOURS OF LIFE (HUNDREDS) 


EXPECTED SET SURVIVAL 


Fig. 3 = Expected number of sets surviving per 100 sets; 
Sylvania tubes, 10 TV sets, lst 1,500 hour test, 


At the completion of each 1,500 hour run, cards representing tube failures 
are collected and delivered to the Statistical Engineering Department. Here the 
tube failure data is analyzed for statistical significance and a report is pre- 
pared. An important part of each report is a curve showing expected set survival 
vs. hours. One such curve is shown in Fig. 3. This curve is computed from the 
tube failures occurring in ten or more receivers and is a prediction of the num- 
ber of sets surviving, by hours, with each tube failure regarded as a complete 
set failure. If 100 sets were run and one tube failed, it is recorded as a com- 
plete failure of the set, 


Of primary interest to a tube manufacturer is how the tube failure figures 
have varied in the past three years. The table in Fig. shows the per cent of 
the tubes which failed from July to July of the years indicated. The differenc~ 
es in the figures shown are significant, and it may be correctly concluded that 
in the past three years there has been an improvement in television sets and/or 
tube designs. To remove the "and/or" question, the table in Fig. 5 was prepared. 
This table represents a compilation of data obtained by using the same group of 
sets to accumulate failure rates of tubes from three different production years. 
In this way the variable of receiver changes has been removed and the comparison 
becomes strictly tubes, In the table obtained it is seen that in 195-55, 7.7 
per cent of the tubes tested in a certain group of receivers failed, while in the 
following year these same receivers, complemented with tubes manufactured one 
year later, showed only 6.2 per cent of the tubes tested failing. The 1.5 per 


cent difference is a significant one and, therefore, it may be concluded that 


el 
eee Fs 


tubes did improve that year. By going further with a second group of receivers, 
an improvement in tube life performance of 3.) per cent is noted. 


How failure rates vary nadie sets made by different manufacturers is another 
very interesting question.and is answered by the curves shown in Fig. 6. ITlus~ 


39 


NO. TUBES NO. fo 
TESTED FAILURES FAILED 


1954-55 4250 328 a7 
1955 -56 5953 387 
1956-57 4886 203 


NO. 
YEARS TUBES 
| TESTED TESTED 


SAMPLE I 
(12 MODELS —7 MFRS.) 


1954 —55 
1955 —56 


SAMPLE II 
(4 MODELS — 4 MFRS.) 


1955 —56 
1956-57 


PERCENT SET SURVIVAL 
1.) 
Oo 


ze) 


8 10 12 | 
HOURS OF LIFE (HUNDREDS) 


16 
ay 


Fig. 6 = Computed set survival of sets manufactured, 


40 


ee aca 


trated here are the expected set survival curves for two generally similar sets 
made by different manufacturers. These curves are consistent with the curves of 
other years for the same manufacturers and, therefore, are not peculiar to a cer= 
tain year. It should be noted here that design of some manufacturers! sets have 
improved, while others have taken the opposite course. 


In the past few years, the number of 600 milliampere series-heater receivers 
has equaled and exceeded the number of transformer-powered receivers. By sorting- 


_ the cards for series-heater set failures and computing the expected set survival 


for each of the past three years, the curves shown in Fig. 7 were obtained. It 
would seem from these curves that there has been no improvement in tubes in the 
past three years, as was suggested by the table in Fig. 5. However, when the 
same curve was constructed for transformer-powered receivers, a different result 
was obtained. As shown in Fig. 8, a four-to-one improvement in expected set sur= 
vival has resulted in the last three years. The curves shown for the transforn-. 
er-powered receivers reveals that controls on heater specification brought on by 
the series-heater sets has also contributed to improved set survival in trans= 
former=powered receivers. 


To assist the factory in improving tube survival, tube failures were grouped 
according to frequency of causes. A list of the most frequent causes is given in 


NO.OF | NO.SYLVANIA | NO. SYLVANIA | PERCENT 
SETS TUBES TESTED] TUBES FAILED | FAILURE 
59 4.8 


80 1230 
92 - 1438); 85 
i53 99 


*SSeses ts 
iS 9a Ed 
Os ets ca Fe mo 
asks PSST al Pte | 


YEAR 


JULY °54-'55 
JULY '55-'56 
JULY 56 -'57 


100 


80 


PERCENT SET SURVIVAL 


ted mesial en 
2 * Se Me Grea 
HOURS OF LIFE (HUNDREDS) 


Fige 7 - Per cent set survival TV sets complemented with 
Sylvania tubes series-heater string sets (in- 
cludes series-parallel string category). 


nm 
B 
o 


41 


NO.OF | NO. SYLVANIA NO. SYLVANIA | PERCENT 
SETS | TUBES TESTED| TUBES FAILED | FAILURE 


120 2300 209 
157 3244 205 
89 1291 58 


YEAR 


JULY 54-55 
JULY °55-56 
JULY 56-57 


Bal 

x 
a 
i 


PERCENT SET SURVIVAL 


(alias 
Boe 
He 
bls 


\e) 2 4 6 8 10 12 14 16 
HOURS OF LIFE HUNDREDS) 


Fig. 8 - Per cent set survival TV sets complemented with 
Sylvania tubes transformer-powered sets. 


Fig. 9. From this table it is seen that of the four major failure causes, per 
cent failures have been reduced by a factor of three to one, on the average. As 
of 1957, the big offender is a collection of miscellaneous little items, of which 
there are some twenty-two. 


Another informative table is found in’Fig. 10. This table shows those ap- 
plications with the highest frequency of failures. Because of the severe re- 
quirements placed on these tubes, over 65 per cent of all tube failures fell in 
one of these four locations. However, in spite of the high percentage of fail- 
ures in these applications, a significant improvement in tube survival has been 
achieved in all four applications. 


At Emporium, Pennsylvania, there are two experimental satellite television 
stations in operation on Channels 22 and 82, This makes Emporium particularly 
well suited for the comparison of expected set survival of vhf sets and vhf-uhf 
sets. As part of the life-test program, a uhf receiver must be capable of sat- 
isfactorily receiving off-tne-air signals on both Channels 22 and 82. To elim- 
inate as many variables as possible, only vhf-uhf receivers were used for this 
comparison, The cards were needled to drop out those cards representing tube 
failures which occurred in sets having uhf, Then, by needling the cards repre- 
senting failures of the uhf oscillator tubes, expected set survival curves for 


42 


FAILURE CAUSE 
OPEN HEATER : 
SHORT CIRCUITS |_| “ | l19 : 
OPEN WELDS OS Tt ers 067 O2 sees! 
RY | 049 
OTHER (22 ITEMS) 137 1.20 
[Sawer 
NO. OF TUBES TESTED 4886 


CIRCUIT JULY '55-56| JULY '56-5 


HORIZONTAL AMP Ure 
VERTICAL AMP 16 
DAMPER 9 
VHF CASCODE AMP FG 


Fig. 10 - Per cent failures of the tubes tested in 
the circuits listed. 


COMPUTED SET SURVIVAL 


O 200 400 600 800 1000 1200 1400 1600 
TIME IN HOURS 


Fig. 11 = Computed set survival of vhf and vhf-uhf 
receivers, 


vhf receivers and vhf-uhf receivers were plotted, as shown in Fig. ll. According 
to the computed curve, at the end of 1,500 hours there would be 1.2 less receiv- 

ers surviving because of uhf. Therefore, although there is an additional tube in 
a receiver with uhf, the expected set survival at 1,500 hours is not significant- 


ly degraded by the addition of uhf. 
| 43 


CONC LUSION 


The test conditions, procedures, and data processing methods employed in 
this program have provided a wealth of information from which these conclusions 
have been drawn. 


1. Set survival is dependent on both tube and set design. 


2, Sylvania has improved tube survival, which has brought about a significant 
jncrease in set survival over the past few years. For further improvement 
in set survival, improved tube design will have to be supported by im- 
proved set design. It is hoped that the information supplied here will 
encourage equipment manufacturers in the direction of improved set design 
and thus gain a more favorable acceptance of their products by the 
consumer. 


3. A knowledge of failure causes and locations is a most valuable aid in im- 
proving set survival. 


he Receiver life is not significantly affected by the addition of uhf. 
These and other conclusions have been drawn from the accumulated data. Be=- 


cause of the flexibility of the above method of information storage, many other 
conclusions may oe drawn with ease and speed. 


44 


RELIABILITY CONTROL BASED ON MULTIPLE SEQUENTIAL FEEDBACK 


C. M. Ryerson 
Radio Corporation of America 
Camden, New Jersey 


Summary <= The classical approach to reliability improvement is based on a single 
feedback loop embracing design, development, production, and field service, A 
procedure of multiple sequential feedback is described which ties in with relia- 
bility prediction to provide a specified reliability on the first production run. 
Techniques are described which can be applied to many industrial operations. Il- 
lustrations show how tests and analyses of various kinds fit in. 


INTRODUCTION 


The classical approach to the production of reliable equipment depends on 
the use of feedback information from the field to guide redevelopment efforts. 
Nearly all reliable electronic equipment now in use was developed this way. A 
design was conceived, manufactured, and put into use. As results from the field 
revealed weaknesSes in the design or construction, this information was collected 
aS a basis for specific corrective measures to be incorporated in subsequent con- 
tracts for redesign. The really mature designs now in use are mostly the result 
of many such cycles of redesign based on the evaluation of field results. 


One major weakness of this classical approach is the length of time, often 
amounting to many years, that is commonly required between the time of the first 
design and the time that the final perfected equipment is available for general 
use. Unfortunately, the modern rate of obsolescence is faster than the rate of 
maturation by this classical approach. A common net result is that many new 
basic designs are never fully perfected and evaluated for their maximum capabil- 
ity before they become superseded by newer likewise unperfected equipments. 


Although steps to improve various phases of the classical approach have been 
helpful, the need is for general use of a different approach such as is described 
herein. This new approach, based on multiple sequential feedback instead of the 
classical single feedback loop, is planned to produce mature designs on each 
first production run. This paper describes the basic features and weaknesses of 
the classical approach, reviews efforts that have been made to improve this ap- 
proach, explains why a new approach is needed, and propeses for general use a 
multiple feedback approach that is proving successful at RCA. 


THE CLASSICAL APPROACH 


The classical approach to reliability improvement consists of a single in- 
formation feedback loop embracing design, development, production, and field ser- 
vice. This is illustrated in Fig. 1. A so-called "good practice" is employed in 
each of the development and production stages. With the advances which have oce- 
curred in the state-of-the-art over the past years, the requirements for what 
constitutes good practice in each technical area have changed. The main approach, 
however, has generally remained the same since the earliest days of engineering. 


The field information obtained after each redevelopment cycle serves to 
trigger a limited number of the more obvious and less expensive corrective meas- 


45 


) General "Good" Design ‘Good™ — 
f Equipment and Engineering | Production 
| Specifications Practice Practice 


Intormation Feedback 


Fig. 1 - The classical approach to reliable equipment. 


ures to be taken during the next redesign cycles. These measures are evaluated 
in turn for their effectiveness during subsequent field use. Many total redesign 
cycles are sometimes necessary for the complete evaluation of many important de= 
sign variables. This process is slow and many desirable corrective measures are 
never taken because of the cost and delay associated with a complete redesign 
cycle or because evidence does not seem to warrant it. 


During the early days of radio so little was known about most of the elec- 
tronic circuits and parts that the state-of-the-art and the related techniques 
which could be considered as good practice were at a very low technical level. 

At the start of this new electronics industry nearly all of the development work 
could be classified as cut-and-try. As the knowledge of physics and materials 
advanced, techniques were devised which elevated the level of good practice. New 
theories and design guides culminated in handbooks and engineering tables which 
have themselves been modernized and improved many times. This has been the pat- 
tern to this day, and in many places the accepted procedure is still one of en- 
gineering to specifications from handbooks, then manufacturing, and finally field 
testing in order to evaluate the design. Good and reliable equipment can still 
be produced by this method if you can afford the time and money. Unfortunately, 
however, the state-of-the-art has been accelerating so rapidly that now many 
basic designs become obsolete in their function before they can traverse the nec- 
essary number of redesign cycles required for their perfection. 


This problem of obsolescence before maturity has now become serious because 
of the current rate of technological advance. New requirements come into effect 
faster than designs can be perfected to meet the old requirements. A brief his- 
torical review of some major technical events may serve to illustrate the fast 
upward rate of change of the state-of-the-art. 


It is sometimes hard to realize that it was only a little over a hundred 
years ago that military cannon were cast of iron or bronze and bolted solidly to 
wooden carriages. Night illumination at that time was largely a matter of open 
flames. Just fifty years ago the best cannon had become steel rifles but were 
dependent on gears and cams for manually cranking the barrel into position. 
Electricity was only beginning to be used for general lighting, and wireless 
communication was only a laboratory novelty. The first airplane flights were 
not made until 1903, and it wasn’t until near the end of World War I that the 
first aerial dogfights occurred using pistols and hand-held rifles as armament. 
It was only about twenty-five years ago that a few simple radios first began to 
appear in airplanes, 


Since that time technological advance has occurred at an ever. increesing 
rate. Today there are artificial satellites circling the earth with automatic 
telemetering. Tomorrow we might see a manned fort in the sky, and the next: day 


46 


a completely automatic space station. Such fantastic developments are no longer 
idle fantasies but have become practical engineering problems, Indeed, the lim- 
it to the extremes that technical complexity and automation can go is fixed by 
the reliability that can be achieved in the designs. 


The rate of technical progress has been "snowballing," but the rate at which 
industry can produce mature designs has not kept pace. The deficiency is in the 
Classical approach to product improvement. The need is for improved means of ob- 
taining mature designs at a much faster rate. 


AN INTERMEDIATE IMPROVED APPROACH 


An improvement in the classical approach is illustrated in Fig. 2. There is 
much work being done toward developing better specifications, better engineering 
and design practices, and better production techniques. The latter includes the 
general use of statistical quality control in the factory. Improved field condi- 
tions, better operator and maintenance training, and elaborate automatic tabula- 
tions of field failure data are steps being taken to improve the classical ap= 
proach. The emphasis is on getting better data faster so that the redesign cycle 
can be shortened. Major efforts are also under way to improve engineering so 
that the available parts will be better and so that designs will make better use 


"Better 
Specifications 


"Better" 
Engineering 
and Design 
Practice 


"Better 
Production 
Practices 


Improved 
Pield 
Conditions 


Fig. 2 - An improved approach. 


of the parts. Techniques are being devised to ensure that circuits will have 

greater safety factors and be more tolerant of parameter changes. Emphasis is 
being placed on ease of maintenance, better maintainability, marginal testing, 
and other improved field techniques. 


These efforts are good and are helping produce better equipment, but they 
are not enough. This improved classical approach still requires field evaluation 
of each design and each design change, and this is not a good enough technique to 
raise the equipment maturity rate to the level of the modern obsolescence rate. 

A new approach to the fast development of mature designs is needed, Such an 
approach is available and has been proven practical by RCA; it is called the mil- 
tiple sequential feedback approach, 


MULTIPLE SEQUENTIAL FEEDBACK 


In brief, the multiple sequential feedback approach replaces many full rede= 
_ sign cycles by multiple prediction of the probable results of many tentative de- 
sign changes. Analysis and prediction based on design consultation and statis- 
tical techniques enables the equivalent of many redesign loops to be performed 

rapidly on paper. When this process yields a design that can be predicted to 
_ have the highest possible field success, the design is frozen. Subsequent spe- 
_¢Cial quality control loops prevent the manufacturing cycle from degrading the 


47 


potential reliability inherent in the design, and a final product evaluation 
phase following production establishes the success of the whole multiple control 
process before the product is released for use. 


In place of the single feedback loop characteristic of the classical ap- 
proach, the multiple sequential feedback approach employs six major control loops 
and many secondary control loops within these. This approach is illustrated in 
Fig. 3. Each control loop consists of feedback from specific analyses which de- 
termine the progress of each project from one stage of development or manufacture 
to the next. The first information feedback from the field is thus only a check 
on the effectiveness of the many anticipatory corrective measures which have been 
previously taken during the various stages of design and manufacture. 


The various control loops function by putting emphasis on obtaining adequate 
information in the correct form at the right time for each management decision on 
each project. Thus the reliability program on each project is continually chang- 
ing to meet the specific needs of the project as it progresses. In general, spe- 
cific assignments in key areas of each operation are made to reliability repre- 
sentatives. These people are required to obtain a certain type of information 
and make it available on time for use by the regular line operation. This might 
be called control by education and analytical review. The responsibility for 
action remains with the regular line management. The permanent record of their 
achievement in producing a reliable product reveals their compliance with the 
program. Certain specific objectives differentiate the functions of the various 
control loops. The following sections summarize them. 


Contract Control 
Successful operation of reliability contract control results in contracts 


which contain specific and realistic over-211 equipment or system reliability 
goals. The cost of achieving these reliability goals must be recognized in the 


Reliability 
Control 


The six jJb- —-----—---}---- ---4f- ~~ LL hoi eee 
major c SLT ee a eee 
control 


loops 


Factory Control 
Product Warranty Cont 


Manut acturing 


ustomer 


and Field Development Reliability Qualit | yet 
; Fs Laity Product . Field Use 
Technical and Project Engineeri ' : 
Titison Ghordtnatise 8 ng Control Evaluation Analysis 
a“ 7 . : 
COOs- 5 OOO = — 
(oe ee . ZTRRRD Qs SY ODO — 
SPO he ay; a ee ea ones 2 


Secondary control loops PRE Ne te ae 
“\within each major areas 


Fig. 3 - The mitiple sequential feedback approach. 
48 


Customer and 
Field Technical 
Liaison 


alistic Over-all 
Bquipment or system 
reliability goals plus 
| reliability cost provi- 
sions in the contract _ 


Output 


J 
a 
Oo, 


Assistance 
1 


on 


Pield Need Analysis 
hnical Consultation 


pecification Pre 


Control 


Release to 


6 wi 4. 
Fig.  - Contract control. 


contract and provision made for direct reliability charges. The factors involved 
in achieving this result can be summarized as follows (see Fig. h)s 


1. Field need analysis 

2. Customer technical liaison 

3. Specification preparation assistance 

hh. Preliminary reliability analyses and comparison with other projects 
5. Interpretation of contract terms’ 

6. Concise summary of project objectives 

7e Analysis of unique or critical contract requirements 

8. Digest of applicable mil. specs 

9. Emphasis on criteria for acceptable performance 
10. Review of unusual or critical conditions 
ll. Arrangements for engineering liaison with appropriate customer contacts 
12. Formal release of project to engineering with clear-cut program 

requirements. 


System Control 


The system control operation results in clarification of all reliability 
goals as they apply to systems, equipments, and components. Specific objectives 
for reliability achievement are developed and furnished to the design groups 
along with their other design criteria such as function, embodiment, cost, etc. 
These reliability goals must be realistic and possible to achieve. A first re- 
port to the customer should explain how these goals were developed and how they 
will result in an end product which meets the field need and the contractual ob-= 
ligations. Probable difficulties in the path of achieving these goals should be 
explained so that the customer will be aware of the validity of his expectations. 


Sometimes a project may consist of a redesign of an existing equipment which 
must be reproduced in new form or size to perform the same functions but in a 
more severe environment. Since the same electrical functions are involved, the 
original circuit can serve as the basis for setting the new reliability goals. 
Suppose, for example, that the project is to subminiaturize a 1,000 part equip= 
ment to fit a smaller space and to operete at a higher local ambient air tem-= 


49 


perature. Suppose also that the original equipment performed with a 200 hour 
mean life at l0°C and that the new version should have about 500 hour mean life 
at 80°C. From Fig. 5 it can be seen that the equipment of 1,000 parts with a 
mean life of 200 hours is operating at about the S level of .5 per cent per 
thousand hours average failure rate for the parts. 


This failure rate is not too difficult to achieve at 0°C. The new require- 
ment is quite different, however. To make the new equipment as good as a 500 
hour mean life requires (from Fig. 5) an average part failure rate of about .15 
per cent per thousand hours. It is presently impossible to buy many types of 
parts which the part suppliers will certify to this level at 80°C. Thus, such a 
project will require many special reliability screening tests and involve many 
difficult vendor and procurement problems. 


In order to clarify how severe the new procurement will be, a part count of 
each type can be made from the old equipment. This count may show, for example, 
that the major procurement problems will be in relays and tubes; resistors, cap= 
acitors, etc., may be available to meet their required failure rates. The gath- 
ering of this information and the reporting of it to the customer is an important 
part of the system control. It keeps the customer informed of the state-of=the- 
art and how his project fits into it. This effort also guides all subsequent 
engineering efforts by revealing the areas needing the most attention. 


If the process of setting part goals to meet the customer request shows that 
the project is impractical and doomed to failure because of impossible failure 
rates being required, the customer should know this at an early hour before money 
is wasted in development. Perhaps the requirements could be lowered in such 


10,000 


1,000 


100 
NUMBER 


OF PARTS 
(INCLUOING 
TUBES) 


EQUIPMENT HOURS MEAN LIFE (M) 


Fig. 5 - Equipment reliability vs. design level. 


50 


Input Specifications 
and Field Need Infor- 
mation 


Specific Sub-system 
Reliability Goals 
and an Integrated 
Reliabilit 


System Development and | oyt 
Project Coordination 


Program 


Inter-svstem Data. 


Coordination 


Release to 
Product Design 


pecafacation ; 
Interpretation System Design 


Fig. 6 - Reliability system centrol. 


cases or provisions made for supplying cooled air to the equipment, Also, such 
information may lead to direct projects for specific part improvement. 


The factors related to system control can be summarized to include the fol- 


lowing (see Fig. 6): 


1. 
26 


36 


he 


Reliability data compilation and comparison of similar systems. 
Analysis of project specification compatability with the field need. 


Prediction of ease of probable achievement of reliability goals based on 
part counts and failure rate information on available parts. 


Detailed system design to include specific subsystem reliability goals 
and specifications. 


Design aid development and summary for assisting in achieving the relia- 
bility goals. 


Plans for system and subsystem testing including acceptance criteria, test 
conditions, and facilities assignment. 


Independent system design review that assesses reliability specifications 
and compliance with system design objectives. 


Plan of complete reliability program plus completion schedule. 


Complete system assembly and test coordination plan with specific instruc- 
tions to product designers. 


Design Control 


Design control is maintained successfully by means of an organized reliabil- 
_ ity engineering operation. Coordinators are assigned to each project and their 


51 


full-time direction is handled by a reliability administrator on the chief en- 
gineer's staff in each product line or department. Standards and part applica- 
tion groups are established and full liaison is maintained with a central relia- 
bility and standards activity. Formal review teams and programs are established 
to evaluate alternative designs, and formal use is made of three types of relia- 
bility prediction. 


Design reviews, an important part of design control, are performed by a team 
of mature engineers who have had experience in the particular type of design un- 
der review. They are appointed by a review supervisor on each project to serve 
only as long as the need for their specialized experience exists. The purpose of 
the reviews is to focus the thinking of many experienced designers on each design 
problem so that the final design will reflect a concerted opinion toward mature 
design. Alternate designs for equivalent function are thus screened for the op= 
timum compromise of engineering criteria. As each designer completes his as- 
signment he lists his design with the review supervisor. Copies of the circuitry 


Designs, Standards 
and Specifications 
for procurement 
production operation, 
testing, and main- 
tenance and Instruc- 
tion Handbooks 


fa 
6 a 
3 o e 1a 
“I rs} .s) CH 
= ia n ay Oo 

E q a + cS) Bal 
3 AY oo hs Bh m ics 
+ rH OO) od ° a 5 
& oO} sp o ¢ A Fe Be 

Oy ort ad a = cet} 

x et! 1 5 f ye) 5 1 > 
a 2} j rs a 

2 ts bo a) 

Be = fom 4 = Oo = = a X eed 
< a} © a4 5 set x 

A RI OS i. 12 
ie Oo; oj] Gc af + a ® : 

eg vt ‘ad & : ° @ 
A wu wv e cant al ew 

: . omey te @ Dal q 
> 7 ' a B=) ¥ 4 St ret] Ag e 
peace / Product —~ aS p Pilot Model* “TY is Release to 
Sign Evaluation Evaluation Vary Production 


Q 


Fig. 7 = Design control. 


and related analyses material from the engineers! notebook are distributed in ad- 
vance to those invited to participate on the review team. 


At the review meeting the designer explains his design in ere j 
obtains the collective advice of those Ae uaat Most RE — Sane ieee 
many reviews before the design is frozen for inclusion in the project design. 

The three basic bypes of design review are preliminary, progress, and final. The 
preliminary reviews analyze a tentative proposal and survey possible alterna= 
tives; the progress reviews examine the design status and audit the theoretical 
and analytical justification for each design decisions and the final reviews con- 


firm the validity of the design approach and approve the compatabil ; 
dual designs with the over-all project ebidetives. patability of indivi- 


Three types of reliability prediction are used in addition to the desi 
. C es 
views to control the progress of the design project within engineering. The ae 
first type is performed as soon as a circuit or subsystem design has been tenta- 


52 


tively established on paper. This first prediction is entirely theoretical, 
since no hardware has been produced except perhaps a few breadboards., The cir- 
cuit design itself is analyzed for the probable stress that will be placed on 
each part and the probable immediate environment of each part. From this infor- 
mation the most likely failure rate of each part is deduced and summed to pre- 
dict the inherent reliability of the design. Special rating charts have been 
designed for this purpose. If this first reliability prediction for each sub- 
system meets the goal set for it during the system design, approval is granted 
to freeze the design at this point and to proceed with the development of engi- 
neering pilot models. Redesign and design reviews are repeated until this first 
reliability prediction meets the assigned goal on each circuit and subsystem. 


The second stage of prediction is performed on accwmlated circuits as soon 
as hardware similar to the final model is available. This should roughly sime 
late the general configuration and embodiment of the final equipment. Actual 
measurements of environmental conditions are used to interpret new analyses of 
part loading and to establish a second prediction of inherent reliability. When 
this figure checks with the first prediction and meets the assigned goals, then 
approval is granted to manufacture the engineering prototypes. Since this anal- 
ysis considers each part in each circuit, a comparison of this data with the data 
from the first prediction will quickly identify those areas needing additional 
development or redesign, These two analyses likewise provide the information in 
regard to the required part failure rates for use in the preparation of part pro=- 
curement specifications. 


Unlike the first prediction which was entirely theoretical and the second 
which was partially so, the third prediction is not theoretical at all but con- 
sists of special reliability tests run on complete engineering prototypes. Stat- 
istical data are obtained on actual failures which must correlate both with pre- 
dictions 1 and 2 and also with the original system goals. Again high failure 
rate areas which show up under design or the use of improper parts are quickly 
located by comparison of the test data with the previous prediction figures. 

When the results of prediction 3 correlate with the previous two predictions, and 
actual failure rates comply with the initial goals, it is safe to release the de- 
sign for production. 


Engineering reliability is also responsible for qualifying the parts speci- 
fied in the design. The major portion of this responsibility is to ascertain 
that the parts specified will exhibit an acceptably low constant failure rate for 
a suitable period when they are stressed to the conditions involved in the de- 
sign. Or in other words, engineering qualification mst certify that the sample 
parts obey the exponential failure law and have an acceptable failure rate for a 
suitable time when used in the planned application conditions. 


Factory Reliability Control 


reliability control is maintained by means of an organized quality 
Beets aration, Aithoaeh quality control can do nothing to improve the relia- 
bility of the product above that inherent in the engineering design, it has the 
major responsibility of insuring that the manufacturing operation does not de- 
grade the reliability below the inherent potential of the design. The most 
mature design, accompanied by the best part specifications cannot result in a 
- reliable equipment unless the manufacturing operation brings out the maximum po- 


53 


tential of the engineering work. Quality control must insure that the various 
reliability goals established during the design operation are met during the 
factory phase. 


It was mentioned earlier that engineering must qualify the parts specified 
to have a suitable life characteristic. Often the parts which receive engineer-= 
ing qualification have been hand-made in the supplier's model shop. Quality 
control must insure that subsequent production lots of parts retain the same life 
characteristics and low failure rates. For very reliable complex equipment the 
problem of procuring parts to a known suitable low failure rate is very diffi- 
cult. Special incoming screening operations and elaborate acceptance tests which 
determine the time stability of the incoming parts must be planned and main- 
tained. A thorough study of the mechanism of failure and the design of special 
tests based on these findings are major responsibilities of quality control. 


There are essentially five control loops in the factory reliability control 
operation: part and component procurement, acceptance testing and incoming in- 
spection, materials handling, production inspection, and production type-tests of 
equipment. These loops and related factors for reliability control can be sum= 
marized, as shown in Fig. 8. Equally important from the long-range viewpoint, 
but not shown in Fig. 8, is the responsibility for all those concerned in the 
vendor coordination program and parts procurement to pass the burden of making 
and supplying certified better parts back to the part manufacturer. 


Product Evaluation Control 


Product evaluation control provides for a thorough reliability evaluation of 
all equipment following production and prior to shipping. Special reliability 


Quality Contro Output si Guarantee that the 
Manufactured pro- 
duct achieves the 
potential reliability 
Inherent in the Design. 


s 
J 
ed 
~ 
3 
wv 
y 
) ts 
Ss p i 
al oo oy “| 
a re) eZ) i 
» 4 cv) ° 4 cs) oo 
8 Q (3) - i) vu ert 2 
rs) ) | » +] + = 
ort A ord 2] 0) 
ue a oe 3) [| ec 3 
pa oO an oct 1@) ” ial 
et eed cet Gey Ay pel 
8 2 | el (=) » = v 
= () 2) O 
o| of <| oy & | a na 
om = 3) bad 3) 
u ee fy! ie) 4s} 12) a 3 
° oe o i?) Lt i}) 3 ¥ 
Le} cS) As) v a) = Cs » 
& q | fe) qd + Hort o n 
ty = vy He me t=} a vu 
> n > HA a) 
Part and Acceptance Materials Production Production 
| Component Testing and J Handling inspection = Type-tests LY 
Procurement Inspection (Equipment) iv, 


Release to 
Product 
Warranty 


Fig. 8 - Factory reliability control. 


54 


tests under controlled conditions yield an accurate measure of the inherent reli- 
ability of the product. This measured figure constitutes a fourth prediction 
which must compare favorably with the first three if quality control has succeed-~ 
ed in its mission. It is important, therefore, that an independent, unbiased 
group perform this control operation. 


If the actual failure rates as measured on final production equipment (while 
these are undergoing simulated field tests) do not coincide with the earlier pre=- 
dictions, a comparison of the tabulated facts for each circuit both before and 
after manufacture will reveal the areas needing better factory control or the use 


d Intorma~ 
tion and Specifi- 
cation Conformance 
Assurance also 
Replacement Part 
Data and Maintenance 
Instructions 


t 
© ba of mt) 
- z =) a 
= bd so O vs 
al o ao 
= re) oles Orr 
¢ n aio 6 n+ 
= = 3) od . o 
aon .. & = q ve } 
ac vu Mou of 
40 i] » het = 7 
rd os 2) 3 4 
= ee Be] ee as Be 4o8 = 
<3 | om Pe tha a 28 "y 
inal Produc ite and Reliability Release for 
Examination ‘ Shipment 
and Analysis Resteee Ney 


Fig. 9 = Product evaluation control. 


of better parts. When all four predictions coincide with the contract and system 
goals, there can be considerable confidence that the equipment development has 
approached maturity on the first production run. 


The factors related to the product evaluation control are illustrated in 
Fig. 9. It can be seen that the very important outputs of this control loop are 
warranty information for the product, assurance of specification and contract 
conformance, and confirmation of part failure rates. The portion of the in- 
struction book on trouble shooting should be written to take advantage of the ex- 
perience gained during this operation. Preventive maintenance schedules should 
also be verified by the results of this control phase or modified accordingly. 
Repeated product evaluation tests will assure a uniformly reliable product. 


Field Results Control 


Earlier sections described the objective of this multiple sequential feed- 
back approach to be the obtaining of mature designs on each first production run. 
If this objective is successfully achieved on a given project, field results can 
only confirm the success. Negative confirmation from field results may reveal 
valuable information about the effective maintainability of the design, the op- 
eration or maintenance ability of the field crews, weaknesses in the contract 
specification or evaluation tests conditions, and misapplication of the equip- 
ment. Many uncontrolled parameters in field applications can introduce a variety 
of conflicting evidence into the records of field results. 


The effective program for field results control will establish analytical 
and statistical measures for categorizing and evaluating each of the contributing 


55 


a ccomplis 
Confirmation 
Information on Opera= 
tion and Maintenance 
Ability of field crews 
- Improved Factory Test 
Criteria 
- Correlation Results 
Between Equipments 
- State of the Art 
Information 
- Field Need Status for 
Puture Contracts 


Accumulation of Data 


- $pecial Data & Failed Part 
Field Surveys 


Laboratory 


ead 
Field Test Simulation 


Fig. 10 - Field results control. 


factors. In addition to the information obtained about the field application 
conditions and the state-of-the-art in the field, each project can certainly pro-= 
vide information helpful to other future projects. Many of the factors involved 
in this sixth major control loop are illustrated in Fig. 10. 


CONCLUSION 


Most attempts to improve the classical approach to more reliable equipment 
fail to consider the major weakness of the single long-time loop. Since the es- 
sence of meeting the modern need is speed of design maturity, an entirely new 
approach such as described herein is needed. 


The multiple sequential feedback approach establishes new criterie for, and 
new concepts of good practice in all development and production specialty fields. 
Statistical analyses and multiple predictions, as well as formal reviews by 
groups of experienced specielists, reveal the best engineering compromises and 
the best production practices. The age of cut-and-try has given way to the age 
of formal analysis and prediction. Completely new standards are necessary and 
available for modern good engineering and good production practice. 


56 


STUDENTS: ATTENTION 


DR. WILLIAM G. TULLER MEMORIAL AWARD 


Awards of $250 will be made for the best papers 
submitted by a senior or graduate student on the 
subject of component parts, The subject may 
relate to operational theory, materials, con- 
struction, design, testing or application of any 
electronic component. Papers mst be submitted 
by December 31, 1958. Consult the Office of the 
Dean or your IRE Faculty Advisor for additional 
information or write tos 


Leon Podolsky, Chairman 

Awards Committee 

IRE Professional Group on Component Parts 
Sprague Electric Company 

North Adams, Massachusetts 


57 


~ 


te 
ae. 
* 


fe 


= 


